Describe Anything: Detailed Localized Image and Video Captioning

Long Lian, Yifan Ding 0002, Yunhao Ge, Sifei Liu, Hanzi Mao, Boyi Li 0001, Marco Pavone 0001, Ming-Yu Liu 0001, Trevor Darrell, Adam Yala, Yin Cui. Describe Anything: Detailed Localized Image and Video Captioning. In IEEE/CVF International Conference on Computer Vision, ICCV 2025, Honolulu, HI, USA, October 19-25, 2025. pages 21766-21777, IEEE, 2025. [doi]

Abstract

Abstract is missing.