Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

Xiaoshi Wu, Hadar Averbuch-Elor, Jin Sun, Noah Snavely. Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. pages 418-427, IEEE, 2021. [doi]

Abstract

Abstract is missing.