Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!

Jack Hessel, Lillian Lee. Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!. In Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. pages 861-877, Association for Computational Linguistics, 2020. [doi]

@inproceedings{HesselL20-0,
  title = {Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!},
  author = {Jack Hessel and Lillian Lee},
  year = {2020},
  url = {https://www.aclweb.org/anthology/2020.emnlp-main.62/},
  researchr = {https://researchr.org/publication/HesselL20-0},
  cites = {0},
  citedby = {0},
  pages = {861-877},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020},
  editor = {Bonnie Webber and Trevor Cohn and Yulan He and Yang Liu},
  publisher = {Association for Computational Linguistics},
  isbn = {978-1-952148-60-6},
}