Cascaded Multilingual Audio-Visual Learning from Videos

Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas 0001, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass. Cascaded Multilingual Audio-Visual Learning from Videos. In Hynek Hermansky, Honza Cernocký, Lukás Burget, Lori Lamel, Odette Scharenborg, Petr Motlícek, editors, Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021. pages 3006-3010, ISCA, 2021. [doi]

@inproceedings{RouditchenkoBH021,
  title = {Cascaded Multilingual Audio-Visual Learning from Videos},
  author = {Andrew Rouditchenko and Angie W. Boggust and David Harwath and Samuel Thomas 0001 and Hilde Kuehne and Brian Chen and Rameswar Panda and Rogério Feris and Brian Kingsbury and Michael Picheny and James R. Glass},
  year = {2021},
  doi = {10.21437/Interspeech.2021-1352},
  url = {https://doi.org/10.21437/Interspeech.2021-1352},
  researchr = {https://researchr.org/publication/RouditchenkoBH021},
  cites = {0},
  citedby = {0},
  pages = {3006-3010},
  booktitle = {Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021},
  editor = {Hynek Hermansky and Honza Cernocký and Lukás Burget and Lori Lamel and Odette Scharenborg and Petr Motlícek},
  publisher = {ISCA},
}