Everything at Once - Multi-modal Fusion Transformer for Video Retrieval

Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas 0001, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne. Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. pages 19988-19997, IEEE, 2022. [doi]

@inproceedings{ShvetsovaCR0KFH22,
  title = {Everything at Once - Multi-modal Fusion Transformer for Video Retrieval},
  author = {Nina Shvetsova and Brian Chen and Andrew Rouditchenko and Samuel Thomas 0001 and Brian Kingsbury and Rogério Feris and David Harwath and James R. Glass and Hilde Kuehne},
  year = {2022},
  doi = {10.1109/CVPR52688.2022.01939},
  url = {https://doi.org/10.1109/CVPR52688.2022.01939},
  researchr = {https://researchr.org/publication/ShvetsovaCR0KFH22},
  cites = {0},
  citedby = {0},
  pages = {19988-19997},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022},
  publisher = {IEEE},
  isbn = {978-1-6654-6946-3},
}