MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker Extraction

Shilong Yu, Chenhui Yang. MAVAR-SE: Multi-scale Audio-Visual Association Representation Network for End-to-End Speaker Extraction. In Stevan Rudinac, Alan Hanjalic, Cynthia C. S. Liem, Marcel Worring, Björn Þór Jónsson 0001, Bei Liu, Yoko Yamakata, editors, MultiMedia Modeling - 30th International Conference, MMM 2024, Amsterdam, The Netherlands, January 29 - February 2, 2024, Proceedings, Part II. Volume 14555 of Lecture Notes in Computer Science, pages 227-238, Springer, 2024. [doi]

Abstract

Abstract is missing.