Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Ling Yang 0006, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui 0001. Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. pages 56704-56721, OpenReview.net, 2024. [doi]

Abstract

Abstract is missing.