Abstract is missing.
- Open-Set Recognition in the Age of Vision-Language ModelsDimity Miller, Niko Sünderhauf, Alex Kenna, Keita Mason. 1-18 [doi]
- Unsqueeze [CLS] Bottleneck to Learn Rich RepresentationsQing Su, Shihao Ji. 19-37 [doi]
- Robust Multimodal Learning via Representation DecouplingShicai Wei, Yang Luo, Yuji Wang, Chunbo Luo. 38-54 [doi]
- Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion ModelsYasi Zhang, Peiyu Yu, Ying Nian Wu. 55-71 [doi]
- WiMANS: A Benchmark Dataset for WiFi-Based Multi-user Activity SensingShuokang Huang, KaiHan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann. 72-91 [doi]
- Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic SegmentationHyunwoo Yu, Yubin Cho, Beoungwoo Kang, Seunghun Moon, Kyeongbo Kong, Suk-Ju Kang. 92-110 [doi]
- VeCLIP: Improving CLIP Training via Visual-Enriched CaptionsZhengfeng Lai, Haotian Zhang, Bowen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao. 111-127 [doi]
- Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediction TasksManyuan Zhang, Guanglu Song, Xiaoyu Shi, Yu Liu 0015, Hongsheng Li 0001. 128-145 [doi]
- Learning Representations from Foundation Models for Domain Generalized Stereo MatchingYongjian Zhang, Longguang Wang, Kunhong Li 0001, Yun Wang, Yulan Guo. 146-162 [doi]
- Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video ReconstructionJianxiong Tang, Jian-Huang Lai, Lingxiao Yang, Xiaohua Xie. 163-179 [doi]
- Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in TransformerQinji Yu, Yirui Wang 0002, Ke Yan 0006, Haoshen Li, Dazhou Guo, Li Zhang 0047, Na Shen, Qifeng Wang, Xiaowei Ding, Le Lu 0001, Xianghua Ye, Dakai Jin. 180-198 [doi]
- Chat-Edit-3D: Interactive 3D Scene Editing via Text PromptsShuangkang Fang, Yufeng Wang 0004, Yi-Hsuan Tsai, Yi Yang 0033, Wenrui Ding, Shuchang Zhou 0001, Ming-Hsuan Yang 0001. 199-216 [doi]
- Event-Adapted Video Super-ResolutionZeyu Xiao, Dachun Kai, Yueyi Zhang, Zheng-Jun Zha, Xiaoyan Sun 0001, Zhiwei Xiong. 217-235 [doi]
- Look Hear: Gaze Prediction for Speech-Directed Human AttentionSounak Mondal, Seoyoung Ahn, Zhibo Yang 0002, Niranjan Balasubramanian, Dimitris Samaras, Gregory J. Zelinsky, Minh Hoai. 236-255 [doi]
- Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View SwitchingXiaoyong Lu, Songlin Du. 256-273 [doi]
- Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World KnowledgeHaibo Wang, Weifeng Ge. 274-292 [doi]
- Catastrophic Overfitting: A Potential Blessing in DisguiseMengnan Zhao, Lihe Zhang, Yuqiu Kong, Baocai Yin. 293-310 [doi]
- Long-Range Turbulence Mitigation: A Large-Scale Dataset and A Coarse-to-Fine FrameworkShengqi Xu, Run Sun, Yi Chang 0002, Shuning Cao, Xueyao Xiao, Luxin Yan. 311-329 [doi]
- SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion ModelsYuwei Guo 0002, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai 0002. 330-348 [doi]
- Visual Alignment Pre-training for Sign Language TranslationPeiqi Jiao, Yuecong Min, Xilin Chen 0001. 349-367 [doi]
- Parrot Captions Teach CLIP to Spot TextYiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang 0065, Weijia Li, Mike Zheng Shou. 368-385 [doi]
- Solving Motion Planning Tasks with a Scalable Generative ModelYihan Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang 0001, Wei Xu 0017, Qiang Liu. 386-404 [doi]
- Griffon: Spelling Out All Object Locations at Any Granularity with Large Language ModelsYufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang 0001, Jinqiao Wang. 405-422 [doi]
- Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality AssessmentHuangbiao Xu, Xiao Ke, Yuezhou Li, Rui Xu, Huanqi Wu, Xiaofeng Lin, Wenzhong Guo. 423-440 [doi]
- Knowledge Transfer with Simulated Inter-image Erasing for Weakly Supervised Semantic SegmentationTao Chen 0012, Xiruo Jiang, Gensheng Pei, Zeren Sun, Yucheng Wang, Yazhou Yao. 441-458 [doi]
- BurstM: Deep Burst Multi-scale SR Using Fourier Space with Optical FlowEungGu Kang, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin. 459-477 [doi]
- Diffusion Reward: Learning Rewards via Conditional Video DiffusionTao Huang, Guangqi Jiang, Yanjie Ze, Huazhe Xu. 478-495 [doi]