An Empirical Study of Multilingual Scene-Text Visual Question Answering

Lin Li 0001, Haohan Zhang, Zeqin Fang. An Empirical Study of Multilingual Scene-Text Visual Question Answering. In Mohan S. Kankanhalli, Ioannis (Yiannis) Patras, Jianquan Liu, Yongkang Wong, Takahiro Komamizu, editors, Proceedings of the 2nd Workshop on User-centric Narrative Summarization of Long Videos, NarSUM 2023, Ottawa ON, Canada, 29 October 2023. pages 3-8, ACM, 2023. [doi]

Authors

Lin Li 0001

This author has not been identified. Look up 'Lin Li 0001' in Google

Haohan Zhang

This author has not been identified. Look up 'Haohan Zhang' in Google

Zeqin Fang

This author has not been identified. Look up 'Zeqin Fang' in Google