Large-Small Model Synergy with Multimodal Fine-Grained Heuristics for Knowledge-Based Visual Question Answering

Zhongfan Sun, Kan Guo, Yongli Hu, Daxin Tian, Qingqing Gao, Jiapu Wang, Junbin Gao, Yanfeng Sun, Baocai Yin. Large-Small Model Synergy with Multimodal Fine-Grained Heuristics for Knowledge-Based Visual Question Answering. In Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Luca Rossetto, Stevan Rudinac, Duc-Tien Dang-Nguyen, Wen-Huang Cheng, Phoebe Chen, Jenny Benois-Pineau, editors, Proceedings of the 33rd ACM International Conference on Multimedia, MM 2025, Dublin, Ireland, October 27-31, 2025. pages 935-944, ACM, 2025. [doi]

Abstract

Abstract is missing.