Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei 0003, Xudong Lin 0003, Shuohang Wang, Ziyi Yang, Chenguang Zhu 0001, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

Authors

Zhenhailong Wang

This author has not been identified. Look up 'Zhenhailong Wang' in Google

Manling Li

This author has not been identified. Look up 'Manling Li' in Google

Ruochen Xu

This author has not been identified. Look up 'Ruochen Xu' in Google

Luowei Zhou

This author has not been identified. Look up 'Luowei Zhou' in Google

Jie Lei 0003

This author has not been identified. Look up 'Jie Lei 0003' in Google

Xudong Lin 0003

This author has not been identified. Look up 'Xudong Lin 0003' in Google

Shuohang Wang

This author has not been identified. Look up 'Shuohang Wang' in Google

Ziyi Yang

This author has not been identified. Look up 'Ziyi Yang' in Google

Chenguang Zhu 0001

This author has not been identified. Look up 'Chenguang Zhu 0001' in Google

Derek Hoiem

This author has not been identified. Look up 'Derek Hoiem' in Google

Shih-Fu Chang

This author has not been identified. Look up 'Shih-Fu Chang' in Google

Mohit Bansal

This author has not been identified. Look up 'Mohit Bansal' in Google

Heng Ji

This author has not been identified. Look up 'Heng Ji' in Google