ProVLA: Compositional Image Search with Progressive Vision-Language Alignment and Multimodal Fusion

Zhizhang Hu, Xinliang Zhu, Son Tran, René Vidal, Arnab Dhua. ProVLA: Compositional Image Search with Progressive Vision-Language Alignment and Multimodal Fusion. In IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Workshops, Paris, France, October 2-6, 2023. pages 2764-2769, IEEE, 2023. [doi]

Abstract

Abstract is missing.