AVCLNet: Multimodal Multispeaker Tracking Network Using Audio-Visual Contrastive Learning

Yihan Li, Yidi Li, Zhenhuan Xu, Hao Guo, Mengyuan Liu 0001, Weiwei Wan. AVCLNet: Multimodal Multispeaker Tracking Network Using Audio-Visual Contrastive Learning. CAAI Trans. Intell. Technol., 11(1):238-255, 2026. [doi]

Abstract

Abstract is missing.