Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios-Konstantinos Anagnostidis, Gregor Bachmann, Thomas Hofmann. Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes. In 34th British Machine Vision Conference 2022, BMVC 2022, Aberdeen, UK, November 20-24, 2023. pages 748-749, BMVA Press, 2023. [doi]

Abstract

Abstract is missing.