Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering

Ting Yu 0002, Kunhao Fu, Jian Zhang 0026, Qingming Huang, Jun Yu 0002. Multi-Granularity Contrastive Cross-Modal Collaborative Generation for End-to-End Long-Term Video Question Answering. IEEE Transactions on Image Processing, 33:3115-3129, 2024. [doi]

Abstract

Abstract is missing.