MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

Thomas Hayes, Songyang Zhang, Xi Yin 0008, Guan Pang, Sasha Sheng, Harry Yang, Songwei Ge, Qiyuan Hu, Devi Parikh. MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration. In Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner, editors, Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part VIII. Volume 13668 of Lecture Notes in Computer Science, pages 431-449, Springer, 2022. [doi]

Abstract

Abstract is missing.