Integrating audio-visual text generation with contrastive learning for enhanced multimodal emotion analysis

Junyi Xiang, Xianxun Zhu, Erik Cambria. Integrating audio-visual text generation with contrastive learning for enhanced multimodal emotion analysis. Information Fusion, 127:103809, 2026. [doi]

Abstract

Abstract is missing.