Shanshan Wang, Archontis Politis, Annamaria Mesaros, Tuomas Virtanen. Self-Supervised Learning of Audio Representations From Audio-Visual Data Using Spatial Alignment. J. Sel. Topics Signal Processing, 16(6):1467-1479, 2022. [doi]
Abstract is missing.