Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Jun Chen 0021, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha. MEERKAT: Audio-Visual Large Language Model for Grounding in Space and Time. In Ales Leonardis, Elisa Ricci 0001, Stefan Roth 0001, Olga Russakovsky, Torsten Sattler, Gül Varol, editors, Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXIV. Volume 15122 of Lecture Notes in Computer Science, pages 52-70, Springer, 2024. [doi]
Abstract is missing.