Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

Yuiga Wada, Kanta Kaneda, Daichi Saito, Komei Sugiura. Polos: Multimodal Metric Learning from Human Feedback for Image Captioning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 13559-13568, IEEE, 2024. [doi]

Abstract

Abstract is missing.