Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models

Wenbin Wang, Liang Ding 0006, Minyan Zeng, Xiabin Zhou, Li Shen 0008, Yong Luo 0002, Wei Yu 0004, Dacheng Tao. Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models. In Toby Walsh, Julie Shah, Zico Kolter, editors, AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA. pages 7907-7915, AAAI Press, 2025. [doi]

Abstract

Abstract is missing.