CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann. CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Workshops, Vancouver, BC, Canada, June 17-24, 2023. pages 5607-5612, IEEE, 2023. [doi]

Abstract

Abstract is missing.