VideoBERT: A Joint Model for Video and Language Representation Learning

Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy 0002, Cordelia Schmid. VideoBERT: A Joint Model for Video and Language Representation Learning. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. pages 7463-7472, IEEE, 2019. [doi]

@inproceedings{SunMV0S19,
  title = {VideoBERT: A Joint Model for Video and Language Representation Learning},
  author = {Chen Sun and Austin Myers and Carl Vondrick and Kevin Murphy 0002 and Cordelia Schmid},
  year = {2019},
  doi = {10.1109/ICCV.2019.00756},
  url = {https://doi.org/10.1109/ICCV.2019.00756},
  researchr = {https://researchr.org/publication/SunMV0S19},
  cites = {0},
  citedby = {0},
  pages = {7463-7472},
  booktitle = {2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019},
  publisher = {IEEE},
  isbn = {978-1-7281-4803-8},
}