Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu 0001, Ce Liu 0001, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang. Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

Authors

Zi-Yi Dou

This author has not been identified. Look up 'Zi-Yi Dou' in Google

Aishwarya Kamath

This author has not been identified. Look up 'Aishwarya Kamath' in Google

Zhe Gan

This author has not been identified. Look up 'Zhe Gan' in Google

Pengchuan Zhang

This author has not been identified. Look up 'Pengchuan Zhang' in Google

Jianfeng Wang

This author has not been identified. Look up 'Jianfeng Wang' in Google

Linjie Li

This author has not been identified. Look up 'Linjie Li' in Google

Zicheng Liu 0001

This author has not been identified. Look up 'Zicheng Liu 0001' in Google

Ce Liu 0001

This author has not been identified. Look up 'Ce Liu 0001' in Google

Yann LeCun

This author has not been identified. Look up 'Yann LeCun' in Google

Nanyun Peng

This author has not been identified. Look up 'Nanyun Peng' in Google

Jianfeng Gao

This author has not been identified. Look up 'Jianfeng Gao' in Google

Lijuan Wang

This author has not been identified. Look up 'Lijuan Wang' in Google