AVQA: A Dataset for Audio-Visual Question Answering on Videos

Pinci Yang, Xin Wang 0019, Xuguang Duan, Hong Chen, Runze Hou, Cong Jin, Wenwu Zhu 0001. AVQA: A Dataset for Audio-Visual Question Answering on Videos. In João Magalhães, Alberto Del Bimbo, Shin'ichi Satoh 0001, Nicu Sebe, Xavier Alameda-Pineda, Qin Jin, Vincent Oria, Laura Toni, editors, MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022. pages 3480-3491, ACM, 2022. [doi]

@inproceedings{Yang0DCHJ022,
  title = {AVQA: A Dataset for Audio-Visual Question Answering on Videos},
  author = {Pinci Yang and Xin Wang 0019 and Xuguang Duan and Hong Chen and Runze Hou and Cong Jin and Wenwu Zhu 0001},
  year = {2022},
  doi = {10.1145/3503161.3548291},
  url = {https://doi.org/10.1145/3503161.3548291},
  researchr = {https://researchr.org/publication/Yang0DCHJ022},
  cites = {0},
  citedby = {0},
  pages = {3480-3491},
  booktitle = {MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022},
  editor = {João Magalhães and Alberto Del Bimbo and Shin'ichi Satoh 0001 and Nicu Sebe and Xavier Alameda-Pineda and Qin Jin and Vincent Oria and Laura Toni},
  publisher = {ACM},
  isbn = {978-1-4503-9203-7},
}