AVLnet: Learning Audio-Visual Language Representations from Instructional Videos - researchr publication

researchr

You are not signed in
Sign in
Sign up

Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas 0001, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba 0001, James R. Glass. AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. In Hynek Hermansky, Honza Cernocký, Lukás Burget, Lori Lamel, Odette Scharenborg, Petr Motlícek, editors, Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021. pages 1584-1588, ISCA, 2021. [doi]

Abstract is missing.

runs on WebDSL