MM-CARP: Multimodal Model with Cross-Modal Retrieval-Augmented and Visual Region Perception

Junhao Guo, Chenhan Fu, Guoming Wang, Rongxing Lu, Dong Chen, Siliang Tang. MM-CARP: Multimodal Model with Cross-Modal Retrieval-Augmented and Visual Region Perception. In MMM (2). pages 424-437, 2025. [doi]

Abstract

Abstract is missing.