GLaMM: Pixel Grounding Large Multimodal Model

Hanoona Abdul Rasheed, Muhammad Maaz 0001, Sahal Shaji Mullappilly, Abdelrahman M. Shaker, Salman H. Khan 0001, Hisham Cholakkal, Rao Muhammad Anwer, Eric P. Xing, Ming-Hsuan Yang 0001, Fahad Shahbaz Khan. GLaMM: Pixel Grounding Large Multimodal Model. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 13009-13018, IEEE, 2024. [doi]

Abstract

Abstract is missing.