LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Omkar Thawakar, Dinura Dissanayake, Ketan Pravin More, Ritesh Thawkar, Ahmed Heakl, Noor Ahsan, Yuhao Li, Mohammed Zumri, Jean Lahoud, Rao Muhammad Anwer, Hisham Cholakkal, Ivan Laptev, Mubarak Shah, Fahad Shahbaz Khan, Salman H. Khan. LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025. pages 24290-24315, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.