An Empirical Study of Multilingual Scene-Text Visual Question Answering

Lin Li 0001, Haohan Zhang, Zeqin Fang. An Empirical Study of Multilingual Scene-Text Visual Question Answering. In Mohan S. Kankanhalli, Ioannis (Yiannis) Patras, Jianquan Liu, Yongkang Wong, Takahiro Komamizu, editors, Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos, NarSUM 2023, Ottawa ON, Canada, 29 October 2023. pages 3-8, ACM, 2023. [doi]

@inproceedings{0001ZF23,
  title = {An Empirical Study of Multilingual Scene-Text Visual Question Answering},
  author = {Lin Li 0001 and Haohan Zhang and Zeqin Fang},
  year = {2023},
  doi = {10.1145/3607540.3617140},
  url = {https://doi.org/10.1145/3607540.3617140},
  researchr = {https://researchr.org/publication/0001ZF23},
  cites = {0},
  citedby = {0},
  pages = {3-8},
  booktitle = {Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos, NarSUM 2023, Ottawa ON, Canada, 29 October 2023},
  editor = {Mohan S. Kankanhalli and Ioannis (Yiannis) Patras and Jianquan Liu and Yongkang Wong and Takahiro Komamizu},
  publisher = {ACM},
}