Hyeonggon Ryu, Seongyu Kim, Joon Son Chung, Arda Senocak. Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. pages 13540-13549, Computer Vision Foundation / IEEE, 2025. [doi]
Abstract is missing.