Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens

Yuxiao Chen 0002, Jianbo Yuan, Yu Tian 0003, Shijie Geng, Xinyu Li, Ding Zhou, Dimitris N. Metaxas, Hongxia Yang. Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. pages 15095-15104, IEEE, 2023. [doi]

@inproceedings{0002Y0GLZMY23,
  title = {Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens},
  author = {Yuxiao Chen 0002 and Jianbo Yuan and Yu Tian 0003 and Shijie Geng and Xinyu Li and Ding Zhou and Dimitris N. Metaxas and Hongxia Yang},
  year = {2023},
  doi = {10.1109/CVPR52729.2023.01449},
  url = {https://doi.org/10.1109/CVPR52729.2023.01449},
  researchr = {https://researchr.org/publication/0002Y0GLZMY23},
  cites = {0},
  citedby = {0},
  pages = {15095-15104},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023},
  publisher = {IEEE},
  isbn = {979-8-3503-0129-8},
}