MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao 0002, Silvio Giancola, Bernard Ghanem. MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 5016-5025, IEEE, 2022. [doi]

Abstract

Abstract is missing.