ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, Qixing Huang, Li Erran Li. ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling. In Ales Leonardis, Elisa Ricci 0001, Stefan Roth 0001, Olga Russakovsky, Torsten Sattler, Gül Varol, editors, Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXI. Volume 15119 of Lecture Notes in Computer Science, pages 37-53, Springer, 2024. [doi]

Authors

Siming Yan

This author has not been identified. Look up 'Siming Yan' in Google

Min Bai

This author has not been identified. Look up 'Min Bai' in Google

Weifeng Chen

This author has not been identified. Look up 'Weifeng Chen' in Google

Xiong Zhou

This author has not been identified. Look up 'Xiong Zhou' in Google

Qixing Huang

This author has not been identified. Look up 'Qixing Huang' in Google

Li Erran Li

This author has not been identified. Look up 'Li Erran Li' in Google