Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning

Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori. Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022. pages 7732-7736, IEEE, 2022. [doi]

Authors

Ankit P. Shah

This author has not been identified. Look up 'Ankit P. Shah' in Google

Shijie Geng

This author has not been identified. Look up 'Shijie Geng' in Google

Peng Gao

This author has not been identified. Look up 'Peng Gao' in Google

Anoop Cherian

This author has not been identified. Look up 'Anoop Cherian' in Google

Takaaki Hori

This author has not been identified. Look up 'Takaaki Hori' in Google

Tim K. Marks

This author has not been identified. Look up 'Tim K. Marks' in Google

Jonathan Le Roux

This author has not been identified. Look up 'Jonathan Le Roux' in Google

Chiori Hori

This author has not been identified. Look up 'Chiori Hori' in Google