${\text{CA}^{2}\text{ST}}$: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition

Jongseo Lee, Joohyun Chang, Dongho Lee, Jinwoo Choi 0001. ${\text{CA}^{2}\text{ST}}$: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition. IEEE Trans. Pattern Anal. Mach. Intell., 48(3):2803-2819, March 2026. [doi]

Abstract

Abstract is missing.