Everything is a Video: Unifying Modalities Through Next-Frame Prediction - researchr publication

researchr

You are not signed in
Sign in
Sign up

G. Thomas Hudson, Dean L. Slack, Thomas Winterbottom, Jamie Sterling, Chenghao Xiao, Junjie Shentu, Noura Al Moubayed. Everything is a Video: Unifying Modalities Through Next-Frame Prediction. In IEEE/CVF International Conference on Computer Vision, ICCV 2025, Honolulu, HI, USA, October 19-25, 2025. pages 22004-22013, IEEE, 2025. [doi]

Abstract is missing.

runs on WebDSL