Question Aware Vision Transformer for Multimodal Reasoning

Roy Ganz, Yair Kittenplon, Aviad Aberdam, Elad Ben-Avraham, Oren Nuriel, Shai Mazor, Ron Litman. Question Aware Vision Transformer for Multimodal Reasoning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. pages 13861-13871, IEEE, 2024. [doi]

Abstract

Abstract is missing.