3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers

Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang. 3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 10503-10512, IEEE, 2022. [doi]

Abstract

Abstract is missing.