Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, Cordelia Schmid. Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. pages 10714-10726, IEEE, 2023. [doi]

Authors

Antoine Yang

This author has not been identified. Look up 'Antoine Yang' in Google

Arsha Nagrani

This author has not been identified. Look up 'Arsha Nagrani' in Google

Paul Hongsuck Seo

This author has not been identified. Look up 'Paul Hongsuck Seo' in Google

Antoine Miech

This author has not been identified. Look up 'Antoine Miech' in Google

Jordi Pont-Tuset

This author has not been identified. Look up 'Jordi Pont-Tuset' in Google

Ivan Laptev

This author has not been identified. Look up 'Ivan Laptev' in Google

Josef Sivic

This author has not been identified. Look up 'Josef Sivic' in Google

Cordelia Schmid

This author has not been identified. Look up 'Cordelia Schmid' in Google