UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding

Dave Zhenyu Chen, Ronghang Hu, Xinlei Chen, Matthias Nießner, Angel X. Chang. UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. pages 18063-18073, IEEE, 2023. [doi]

Abstract

Abstract is missing.