MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Mattia Soldan, Alejandro Pardo, Juan León Alcázar, Fabian Caba Heilbron, Chen Zhao 0002, Silvio Giancola, Bernard Ghanem. MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 5016-5025, IEEE, 2022. [doi]

Authors

Mattia Soldan

This author has not been identified. Look up 'Mattia Soldan' in Google

Alejandro Pardo

This author has not been identified. Look up 'Alejandro Pardo' in Google

Juan León Alcázar

This author has not been identified. Look up 'Juan León Alcázar' in Google

Fabian Caba Heilbron

This author has not been identified. Look up 'Fabian Caba Heilbron' in Google

Chen Zhao 0002

This author has not been identified. Look up 'Chen Zhao 0002' in Google

Silvio Giancola

This author has not been identified. Look up 'Silvio Giancola' in Google

Bernard Ghanem

This author has not been identified. Look up 'Bernard Ghanem' in Google