Align vision-language semantics by multi-task learning for multi-modal summarization

Chenhao Cui, Xinnian Liang, Shuangzhi Wu, Zhoujun Li 0001. Align vision-language semantics by multi-task learning for multi-modal summarization. Neural Computing and Applications, 36(25):15653-15666, September 2024. [doi]

Abstract

Abstract is missing.