MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

Jianbin Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu 0001, Zuopeng Yang, Changxing Ding, Dacheng Tao. MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis. International Journal of Computer Vision, 132(9):3537-3565, September 2024. [doi]

Abstract

Abstract is missing.