Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Yuying Ge, Yizhuo Li 0001, Yixiao Ge, Ying Shan. Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. pages 13606-13617, Computer Vision Foundation / IEEE, 2025. [doi]

Authors

Yuying Ge

This author has not been identified. Look up 'Yuying Ge' in Google

Yizhuo Li 0001

This author has not been identified. Look up 'Yizhuo Li 0001' in Google

Yixiao Ge

This author has not been identified. Look up 'Yixiao Ge' in Google

Ying Shan

This author has not been identified. Look up 'Ying Shan' in Google