Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

Yangyi Chen, Hao Peng, Tong Zhang 0001, Heng Ji 0001. Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training. Trans. Mach. Learn. Res., 2026, 2026. [doi]

Abstract

Abstract is missing.