Alan Dao, Norapat Buppodom. VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation. In Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2025, Singapore, October 22-24, 2025. pages 1464-1469, IEEE, 2025. [doi]
Abstract is missing.