VideoPrism: A Foundational Visual Encoder for Video Understanding

Long Zhao 0003, Nitesh Bharadwaj Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao 0006, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang 0001, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu 0005, Boqing Gong. VideoPrism: A Foundational Visual Encoder for Video Understanding. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024. [doi]

Abstract

Abstract is missing.