Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu 0001, Ce Liu 0001, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang. Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

@inproceedings{DouKGZWL00LPGW22,
  title = {Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone},
  author = {Zi-Yi Dou and Aishwarya Kamath and Zhe Gan and Pengchuan Zhang and Jianfeng Wang and Linjie Li and Zicheng Liu 0001 and Ce Liu 0001 and Yann LeCun and Nanyun Peng and Jianfeng Gao and Lijuan Wang},
  year = {2022},
  url = {http://papers.nips.cc/paper_files/paper/2022/hash/d4b6ccf3acd6ccbc1093e093df345ba2-Abstract-Conference.html},
  researchr = {https://researchr.org/publication/DouKGZWL00LPGW22},
  cites = {0},
  citedby = {0},
  booktitle = {Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022},
  editor = {Sanmi Koyejo and S. Mohamed and A. Agarwal and Danielle Belgrave and K. Cho and A. Oh},
  isbn = {9781713871088},
}