Consistent multimodal pre-training for visual tokenization

Ting Pan, Lulu Tang, Xinlong Wang, Xin Liu 0044, Shiguang Shan. Consistent multimodal pre-training for visual tokenization. Science in China Series F: Information Sciences, 68(10), 2025. [doi]

Abstract

Abstract is missing.