End-to-end Generative Pretraining for Multimodal Video Captioning

Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid. End-to-end Generative Pretraining for Multimodal Video Captioning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 17938-17947, IEEE, 2022. [doi]

Authors

Paul Hongsuck Seo

This author has not been identified. Look up 'Paul Hongsuck Seo' in Google

Arsha Nagrani

This author has not been identified. Look up 'Arsha Nagrani' in Google

Anurag Arnab

This author has not been identified. Look up 'Anurag Arnab' in Google

Cordelia Schmid

This author has not been identified. Look up 'Cordelia Schmid' in Google