VGDIFFZERO: Text-To-Image Diffusion Models Can Be Zero-Shot Visual Grounders

Xuyang Liu, Siteng Huang, Yachen Kang, Honggang Chen, Donglin Wang. VGDIFFZERO: Text-To-Image Diffusion Models Can Be Zero-Shot Visual Grounders. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2024, Seoul, Republic of Korea, April 14-19, 2024. pages 2765-2769, IEEE, 2024. [doi]

Abstract

Abstract is missing.