Abstract is missing.
- Generating Physically Realistic and Directable Human Motions from Multi-modal InputsAayam Shrestha, Pan Liu, Germán Ros, Kai Yuan, Alan Fern. 1-17 [doi]
- CoTracker: It Is Better to Track TogetherNikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht 0001. 18-35 [doi]
- SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language ModelsZiyi Lin, Dongyang Liu, Renrui Zhang, Peng Gao 0007, Longtian Qiu, Han Xiao, Han Qiu 0010, Wenqi Shao, Keqin Chen, Jiaming Han, Siyuan Huang, Yichi Zhang, Xuming He 0001, Yu Qiao 0001, Hongsheng Li 0001. 36-55 [doi]
- PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in PathologyYuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang 0033, YunLong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin 0004, Lin Yang 0002. 56-73 [doi]
- Improving Adversarial Transferability via Model AlignmentAvery Ma, Amir Massoud Farahmand, Yangchen Pan, Philip Torr 0001, Jindong Gu. 74-92 [doi]
- RealGen: Retrieval Augmented Generation for Controllable Traffic ScenariosWenhao Ding, Yulong Cao, Ding Zhao, Chaowei Xiao, Marco Pavone 0001. 93-110 [doi]
- ADen: Adaptive Density Representations for Sparse-View Camera Pose EstimationHao Tang, Weiyao Wang 0001, Pierre Gleize, Matt Feiszli. 111-128 [doi]
- Embodied Understanding of Driving ScenariosYunsong Zhou, Linyan Huang, Qingwen Bu, Jia Zeng, Tianyu Li, Hang Qiu 0001, Hongzi Zhu, Minyi Guo, Yu Qiao 0001, Hongyang Li 0001. 129-148 [doi]
- Learning to Drive via Asymmetric Self-PlayChris Zhang 0001, Sourav Biswas 0001, Kelvin Wong, Kion Fallah, Lunjun Zhang, Dian Chen, Sergio Casas 0002, Raquel Urtasun. 149-168 [doi]
- OpenIns3D: Snap and Lookup for 3D Open-Vocabulary Instance SegmentationZhening Huang, Xiaoyang Wu 0002, Xi Chen, Hengshuang Zhao, Lei Zhu 0003, Joan Lasenby. 169-185 [doi]
- ViLA: Efficient Video-Language Alignment for Video Question AnsweringXijun Wang 0002, Junbang Liang, Chun-Kai Wang, Kenan Deng 0001, Yu Lou 0003, Ming C. Lin, Shan Yang. 186-204 [doi]
- Factorizing Text-to-Video Generation by Explicit Image ConditioningRohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin 0001, Devi Parikh, Ishan Misra. 205-224 [doi]
- MobileDiffusion: Instant Text-to-Image Generation on Mobile DevicesYang Zhao, Yanwu Xu 0003, Zhisheng Xiao, Haolin Jia, Tingbo Hou. 225-242 [doi]
- Open-Set Biometrics: Beyond Good Closed-Set ModelsYiyang Su, Minchul Kim, Feng Liu 0037, Anil K. Jain 0001, Xiaoming Liu 0002. 243-261 [doi]
- UNIT: Backdoor Mitigation via Automated Neural Distribution TighteningSiyuan Cheng 0005, Guangyu Shen, Kaiyuan Zhang 0002, Guanhong Tao 0001, Shengwei An, Hanxi Guo, ShiQing Ma, Xiangyu Zhang 0001. 262-281 [doi]
- Which Model Generated This Image? A Model-Agnostic Approach for Origin AttributionFengyuan Liu, Haochen Luo, Yiming Li, Philip Torr 0001, Jindong Gu. 282-301 [doi]
- Osmosis: RGBD Diffusion Prior for Underwater Image RestorationOpher Bar Nathan, Deborah Levy, Tali Treibitz, Dan Rosenbaum. 302-319 [doi]
- Towards Adaptive Pseudo-Label Learning for Semi-Supervised Temporal Action LocalizationFeixiang Zhou, Bryan M. Williams 0001, Hossein Rahmani 0001. 320-338 [doi]
- Computing the Lipschitz Constant Needed for Fast Scene Recovery from CASSI MeasurementsAnders Holst, Niels Chr. Overgaard. 339-353 [doi]
- DatasetNeRF: Efficient 3D-Aware Data Factory with Generative Radiance FieldsYu Chi 0002, Fangneng Zhan, Sibo Wu, Christian Theobalt, Adam Kortylewski. 354-372 [doi]
- Flowed Time of Flight Radiance FieldsMikhail Okunev, Marc Mapeke, Benjamin Attal, Christian Richardt, Matthew O'Toole, James Tompkin 0001. 373-389 [doi]
- 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object EditingHaoran Li, Long Ma, Haolin Shi, Yanbin Hao, Yong Liao, Lechao Cheng, Peng Yuan Zhou. 390-406 [doi]
- Fast Registration of Photorealistic Avatars for VR Facial AnimationChaitanya Patel, Shaojie Bai, Te-Li Wang, Jason M. Saragih, Shih-En Wei. 407-423 [doi]
- CoPT: Unsupervised Domain Adaptive Segmentation Using Domain-Agnostic Text EmbeddingsCristina Mata, Kanchana Ranasinghe, Michael S. Ryoo. 424-440 [doi]
- HiFi-Score: Fine-Grained Image Description Evaluation with Hierarchical Parsing GraphsZiwei Yao, Ruiping Wang 0001, Xilin Chen 0001. 441-458 [doi]
- Image-to-Lidar Relational Distillation for Autonomous Driving DataAnas Mahmoud 0002, Ali Harakeh, Steven L. Waslander. 459-475 [doi]
- Thinking Outside the BBox: Unconstrained Generative Object CompositingGemma Canet Tarres, Zhe Lin 0001, Zhifei Zhang, Jianming Zhang 0001, Yizhi Song, Dan Ruta, Andrew Gilbert, John P. Collomosse, Soo Ye Kim. 476-495 [doi]