Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei 0003, Xudong Lin 0003, Shuohang Wang, Ziyi Yang, Chenguang Zhu 0001, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji. Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

@inproceedings{WangLXZ00WY0HCB22,
  title = {Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners},
  author = {Zhenhailong Wang and Manling Li and Ruochen Xu and Luowei Zhou and Jie Lei 0003 and Xudong Lin 0003 and Shuohang Wang and Ziyi Yang and Chenguang Zhu 0001 and Derek Hoiem and Shih-Fu Chang and Mohit Bansal and Heng Ji},
  year = {2022},
  url = {http://papers.nips.cc/paper_files/paper/2022/hash/381ceeae4a1feb1abc59c773f7e61839-Abstract-Conference.html},
  researchr = {https://researchr.org/publication/WangLXZ00WY0HCB22},
  cites = {0},
  citedby = {0},
  booktitle = {Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022},
  editor = {Sanmi Koyejo and S. Mohamed and A. Agarwal and Danielle Belgrave and K. Cho and A. Oh},
  isbn = {9781713871088},
}