Multimodal Pretraining for Dense Video Captioning

Gabriel Huang, Bo Pang, Zhenhai Zhu, Clara Rivera, Radu Soricut. Multimodal Pretraining for Dense Video Captioning. In Kam-Fai Wong, Kevin Knight, Hua Wu, editors, Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2020, Suzhou, China, December 4-7, 2020. pages 470-490, Association for Computational Linguistics, 2020. [doi]

Abstract

Abstract is missing.