Andrew Rouditchenko, Yuan Gong 0001, Samuel Thomas 0001, Leonid Karlinsky, Hilde Kuehne, Rogério Feris, James Glass. Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation. In Itshak Lapidot, Sharon Gannot, editors, 25th Annual Conference of the International Speech Communication Association, Interspeech 2024, Kos, Greece, September 1-5, 2024. ISCA, 2024. [doi]
Abstract is missing.