TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

Shuhuai Ren, Sishuo Chen, Shicheng Li, Xu Sun 0001, Lu Hou. TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding. In Houda Bouamor, Juan Pino 0001, Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023. pages 932-947, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.