Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought

Vaishnavi Himakunthala, Andy Ouyang, Daniel Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Yang Wang. Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023. pages 204-219, Association for Computational Linguistics, 2023. [doi]

Authors

Vaishnavi Himakunthala

This author has not been identified. Look up 'Vaishnavi Himakunthala' in Google

Andy Ouyang

This author has not been identified. Look up 'Andy Ouyang' in Google

Daniel Rose

This author has not been identified. Look up 'Daniel Rose' in Google

Ryan He

This author has not been identified. Look up 'Ryan He' in Google

Alex Mei

This author has not been identified. Look up 'Alex Mei' in Google

Yujie Lu

This author has not been identified. Look up 'Yujie Lu' in Google

Chinmay Sonar

This author has not been identified. Look up 'Chinmay Sonar' in Google

Michael Saxon

This author has not been identified. Look up 'Michael Saxon' in Google

William Yang Wang

This author has not been identified. Look up 'William Yang Wang' in Google