Everything is a Video: Unifying Modalities Through Next-Frame Prediction

G. Thomas Hudson, Dean L. Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed. Everything is a Video: Unifying Modalities Through Next-Frame Prediction. In IEEE/CVF International Conference on Computer Vision, ICCV 2025, Honolulu, HI, USA, October 19-25, 2025. pages 22004-22013, IEEE, 2025. [doi]

Abstract

Abstract is missing.