Abstract is missing.
- AI-Mediated Human InteractionShalini De Mello. 1 [doi]
- Next Phase of Research on Multimodal Foundation Models: From Alignments to Content Generation and Quality AssessmentTat-Seng Chua. 2 [doi]
- SenseCam and Isotyping: The Challenges and Benefits of Working with New HardwareSteve Hodges 0001. 3-4 [doi]
- MotionRefineNet: Fine-Grained Pose Sequence Smoothing and RefinementHaolun Li 0001, Weihuang Liu, Jiateng Liu, Zhenhua Tang, Chi-Man Pun, Qiguang Miao, Feng Xu 0005, Hao Gao 0005. 5-14 [doi]
- Change-UP: Advancing Visualization and Inference Capability for Multi-level Remote Sensing Change InterpretationMo Yang, Luo Chen, Jiali Zhou. 15-24 [doi]
- Cross Time Domain Intention Interaction for Conditional Trajectory PredictionYuxiang Zhao, Wei Huang, Haipeng Zeng, Huan Zhao, Yujie Song. 25-33 [doi]
- SIDA: Synthetic Image Driven Zero-shot Domain AdaptationYe Chan Kim, SeungJu Cha 0001, Si-Woo Kim, Taewhan Kim, Dong Jin Kim. 34-42 [doi]
- Efficient Video Anomaly Detection via Scene-Dependent Memory Assisted Inter-Frame RGB Difference ReconstructionHan Hu 0009, Wenli Du, Bing Wang. 43-51 [doi]
- Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction ReconstructionHyungjun Doh, Dong-In Lee, Seunggeun Chi, Pin-Hao Huang, Kwonjoon Lee, Sangpil Kim, Karthik Ramani. 52-61 [doi]
- Zero-Shot Multimodal Fact-Checking with Conceptual ReasoningGuoyi Li, Die Hu 0004, Haozhe Li, Qirui Tang, Xiaomeng Fu, Yulei Wu, Xiaodan Zhang 0004, Honglei Lyu. 62-71 [doi]
- 3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field RenderingJunyu Zhou, Yuyang Huang, Wenrui Dai, Junni Zou, Ziyang Zheng, Nuowen Kan, Chenglin Li, Hongkai Xiong. 72-81 [doi]
- 2: Dual-Stage Invariance Transfer Learning for Generalizable Document Image Tampering LocalizationSongze Li, Yunfei Guo, Shen Chen, Bin Li 0011, Kaiqing Lin, Changsheng Chen 0001, Haodong Li 0001, Taiping Yao, Shouhong Ding. 82-91 [doi]
- RobustVisH: Robust Visual-Haptic Cross-Modal Recognition under Transmission InterferenceRouqi Zhang, Chengdi Lu, Hancheng Lu, Yang Cao, Tiesong Zhao. 92-100 [doi]
- Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object DetectionZhangchi Hu, Peixi Wu, Jie Chen, Huyue Zhu, Yijun Wang, Yansong Peng, Hebei Li, Xiaoyan Sun 0001. 101-110 [doi]
- Butter: Frequency Consistency and Hierarchical Fusion for Autonomous Driving Object DetectionXiaojian Lin, Wenxin Zhang 0005, Yuchu Jiang, Wangyu Wu, Yiran Guo, Kangxu Wang, Zongzheng Zhang, Guijin Wang, Lei Jin, Hao Zhao. 111-120 [doi]
- REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-ExpertsXinkui Lin, Yongxiu Xu, Minghao Tang, Shilong Zhang, Hongbo Xu, Hao Xu, Yubin Wang. 121-130 [doi]
- Boosting Single-Domain Generalized Object Detection via Vision-Language Knowledge InteractionXiaoran Xu, Jiangang Yang, Wenyue Chong, Wenhui Shi, Shichu Sun, Jing Xing, Jian Liu. 131-140 [doi]
- Spatiotemporal Degradation-Aware 3D Gaussian Splatting for Realistic Underwater Scene ReconstructionShaohua Liu 0003, Ning Gao 0004, Zuoya Gu, Hongkun Dou, Yue Deng 0001, Hongjue Li. 141-150 [doi]
- EBaR: Efficient Buffer and Resetting for Single-Sample Continual Test-Time AdaptationTianyi Ma, Maoying Qiao. 151-160 [doi]
- RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene CompletionWenzhe He, Xiaojun Chen, Wentang Chen, Hongyu Wang, Ying Liu 0027, Ruihui Li. 161-170 [doi]
- Efficient Trajectory Space-Time Super-Resolution for Fast Live-cell ImagingRuian He, Zixian Zhang, Ri Cheng, Weimin Tan, Bo Yan. 171-179 [doi]
- Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial TrainingHongzhao Li, Hualei Wan, Liangzhi Zhang, Mingyuan Jiu, Shupan Li, Mingliang Xu 0001, Muhammad Haris Khan. 180-188 [doi]
- Object-Preserving Counterfactual Diffusion Augmentation for Single-Domain Generalized Object DetectionHongda Qin, Xiao Lu 0002, Zhiyong Wei, Ningjiang Chen. 189-198 [doi]
- Unleashing the Power of Data Generation in One-Pass Outdoor LiDAR LocalizationYidong Chen 0006, Qi Li, Yuyang Yang, Wen Li, Sheng Ao, Cheng Wang 0003. 199-208 [doi]
- EvRAW: Event-guided Structural and Color Modeling for RAW-to-sRGB Image ReconstructionWenli Zheng, Huiyuan Fu, Xicong Wang, Hao Kang, Chuanming Wang, Jin Liu 0024, Zekai Xu, Heng Zhang 0042, Huadong Ma. 209-218 [doi]
- From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language ModelsZhaoxi Mu, Rilin Chen, Andong Li, Meng Yu 0003, Xinyu Yang, Dong Yu 0001. 219-228 [doi]
- EDeF-Net: Spatio-temporal Association Network for Flicker Removal in Event StreamsJin Han 0001, Yixin Yang 0008, Zhan Zhan, Boxin Shi, Imari Sato. 229-237 [doi]
- BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance SegmentationJinxiang Lai, Wenlong Wu, Jiawei Zhan, Jian Li 0062, Bin-Bin Gao, Jun Liu, Jie Zhang, Song Guo 0001. 238-246 [doi]
- CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point CloudsJiaxu Li, Rui Li, Jianyu Qi, Songning Lai, Linpu Lv, Kejia Fan, Jianheng Tang 0001, Yutao Yue, Dongzhan Zhou, Yunhuai Liu, Huiping Zhuang. 247-256 [doi]
- Interpreting Radiologist's Intention from Eye Movements in Chest X-ray DiagnosisTrong-Thang Pham, Anh Nguyen 0003, Zhigang Deng 0001, Carol C. Wu, Hien Nguyen, Ngan Le. 257-266 [doi]
- FGRFlow: Learning Fine-Grained Rigidity Scene Flow from 4D Radar Point CloudMingliang Zhai, Yiheng Wang, Haidong Hu, Chi-Man Pun, Hao Gao 0005. 267-276 [doi]
- Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNetXiaoyu Zhang, Zhifeng Bao, Hai Dong, Ziwei Wang, Jiajun Liu. 277-285 [doi]
- DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object DetectionGuiping Cao, Xiangyuan Lan, Wenjian Huang 0001, Jianguo Zhang 0001, Dongmei Jiang, Yaowei Wang 0001. 286-295 [doi]
- Video-based Transparent Object Segmentation via Temporal Feature AggregationZhen Wang, Dongyuan Li, Yaozu Wu, Peide Zhu, Shiyin Tan, Renhe Jiang. 296-304 [doi]
- G2LFormer: Global-to-Local Query Enhancement for Robust Table Structure RecognitionHaosheng Cai, Yang Xue 0001. 305-314 [doi]
- SPAN: Continuous Modeling of Suspicion Progression for Temporal Intention LocalizationXinyi Hu, Yuran Wang, Ruixu Zhang, Yue Li, Wenxuan Liu 0008, Zheng Wang 0007. 315-323 [doi]
- Edge-aware Affinity Enhancement for Image Manipulation LocalizationTianyi Zhang, Qinglong Lin, Yang Hu, Pengming Feng, Rubo Zhang. 324-332 [doi]
- HydraMamba: Multi-Head State Space Model for Global Point Cloud LearningKanglin Qu, Pan Gao 0001, Qun Dai, Yuanhao Sun. 333-342 [doi]
- UIS-Mamba: Exploring Mamba for Underwater Instance Segmentation via Dynamic Tree Scan and Hidden State WeakenRunmin Cong, Zongji Yu, Hao Fang 0010, Haoyan Sun, Sam Kwong. 343-352 [doi]
- MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image DetectionKuo Shi, Jie Lu, Shanshan Ye, Guangquan Zhang 0001, Zhen Fang 0001. 353-361 [doi]
- Text-Promptable Propagation for Referring Medical Image Sequence SegmentationRuntian Yuan, Mohan Chen 0001, Jilan Xu, Ling Zhou, Qingqiu Li, Yuejie Zhang, Rui Feng 0001, Tao Zhang 0022, Shang Gao 0003. 362-371 [doi]
- Multiple Queries with Multiple Keys: A Precise Prompt Matching Paradigm for Prompt-based Continual LearningDunwei Tu, Huiyu Yi, YuChi Wang, Baile Xu, Jian Zhao 0013, Furao Shen. 372-381 [doi]
- From Language to Instance: Generative Visual Prompting for Zero-shot Camouflaged Object DetectionZihou Zhang, Hao Li 0093, Zhengwei Yang 0001, Zechao Hu, Liang Li 0003, Zheng Wang 0007. 382-391 [doi]
- From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Open-vocabulary Grounded Situation RecognitionChen Cai, Tianyi Liu, Jianjun Gao 0005, Wenyang Liu, Kejun Wu, Ruoyu Wang, Yi Wang 0068, Soo Chin Liew. 392-401 [doi]
- TFPA: Text Features Guided Dynamic Parameter Adjustment for Few Shot Action RecognitionHanyu Guo, Suzhou Que, Junlong Gao, Hanzi Wang. 402-411 [doi]
- DOMR: Establishing Cross-View Segmentation via Dense Object MatchingJitong Liao, YuLu Gao, Shaofei Huang 0001, Jialin Gao, Jie Lei 0002, Ronghua Liang, Si Liu 0001. 412-421 [doi]
- NeuroPump: Simultaneous Geometric and Color Rectification for Underwater ImagesYue Guo, Haoxiang Liao, Haibin Ling, BingYao Huang. 422-431 [doi]
- Client-Server Co-design with Multi-modal Codebooks Makes Better and Faster Federate Knowledge SharingYichi Zhang 0009, Zhuo Chen 0007, Lingbing Guo, Yajing Xu, Lei Liang 0002, Wen Zhang 0015, Huajun Chen. 432-440 [doi]
- Severe Light, Textureless Sight: A Benchmark for Extreme Exposure CorrectionBo Wang 0108, Jin Liu 0024, Huiyuan Fu, Xin Wang 0001, Heng Zhang 0042, Huadong Ma. 441-449 [doi]
- APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic SpeechZhicheng Lian, Lizhi Wang 0001, Hua Huang 0001. 450-459 [doi]
- Geo-CF2Net: Geometry-Prior Cross-Frequency Interactive Fusion Network for 3D Human Action RecognitionZhaoyu Chen, Qian Huang, Xing Li, Yunfei Zhang, Shihao Han, Ge Gao, Yirui Wu, Xin Li 0090, Ziyang Yin. 460-469 [doi]
- Focus on the Object: Gradient-based Feature Modulation for Camouflaged Object SegmentationNaisong Luo, Yuan Wang 0064, Yuwen Pan, Rui Sun 0006. 470-478 [doi]
- An Event-tailored State-Space Based Model for Pedestrian DetectionLiuyi Li, Feng Shi, Jian Wang, Jinjing Zhu, Wenze Shao. 479-488 [doi]
- OV-VOD: Open-Vocabulary Video Object DetectionZhihong Zheng, Yang Cao, Junlong Gao, Hanzi Wang. 489-498 [doi]
- SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard ExamplesYin Wang 0004, Zixuan Wang, Hao Lu 0009, Zhen Qin 0004, Hailiang Zhao, Guanjie Cheng, Xin Du, Ge Su, Li Kuang, MengChu Zhou, ShuiGuang Deng. 499-507 [doi]
- DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting FrameworkKuiye Ding, Fanda Fan, Yao Wang, Ruijie Jian, Xiaorui Wang, Luqi Gong, Yishan Jiang, Chunjie Luo, Jianfeng Zhan. 508-517 [doi]
- ESOD: Event-Based Small Object DetectionQuanmin Liang, Jinyi Lu, Qiang Li, Shuai Liu 0009, Zhihao Zhao, Yinzheng Zhao, Wei Zhang 0161, Kai Huang 0001, Yonghong Tian 0001. 518-527 [doi]
- Cross-Modal Metrics for Capturing Correspondences Between Music Audio and Stage Lighting SignalsMichael Kohl, Tobias Wursthorn, Christof Weiß. 528-534 [doi]
- Collaborative Cloud-edge Generalized Category DiscoveryYingbing Liu, Fei Ma 0001, Yanan Wu, Xinxin Zuo, Fan Zhang 0007, Yang Wang 0003. 535-543 [doi]
- Sample-level Adaptive Knowledge Distillation for Action RecognitionPing Li, Chenhao Ping, Wenxiao Wang, Mingli Song. 544-552 [doi]
- OV-DAVEL: Towards Open-Vocabulary Dense Audio-Visual Event Localization in Untrimmed VideosJiale Yu, Baopeng Zhang, Zhu Teng, Jianping Fan 0007. 553-562 [doi]
- Retaining Temporal Semantics and Relation Topologies for Continual Weakly-Supervised Audio-Visual Video ParsingJie Fu 0004, Bingkun Bao. 563-572 [doi]
- TNT-GS: Truncated and Tailored Gaussian SplattingXiaofeng Liu 0001, Guanchen Meng, Chongyang Feng, Risheng Liu, Zhongxuan Luo, Xin Fan. 573-581 [doi]
- Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal QueriesPengfei Cai, Yan Song 0001, Qing Gu 0002, Nan Jiang 0022, Haoyu Song, Ian McLoughlin 0001. 582-591 [doi]
- HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMsZhaolin Cai, Fan Li, Ziwei Zheng, Yanjun Qin. 592-601 [doi]
- ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space ModelGuanchun Wang, Xiangrong Zhang, Yifei Zhang, Zelin Peng, Tianyang Zhang 0002, Xu Tang 0004, Licheng Jiao. 602-611 [doi]
- DHGCN: Dual HyperGraph Convolutional Network for EEG-Based Auditory Attention DetectionJian Zhou 0006, Yingjie Xie, Cunhang Fan, Huabin Wang, Zhao Lv, Liang Tao. 612-620 [doi]
- Proactive Deepfake Detection via Self-Verifiable Semantic WatermarkingPeiqi Jiang, Bohan Lei, Yuhao Sun, Lingyun Yu 0002, Zhineng Chen, Hongtao Xie, Yongdong Zhang. 621-630 [doi]
- Self-Supervised Vision Graph Neural Networks Based on Contrastive LearningYuzhen Li, Yuehui Han, Jianjun Qian, Jian Yang. 631-640 [doi]
- Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change DetectionLuosheng Xu, Dalin Zhang 0001, Zhaohui Song. 641-649 [doi]
- RWKV3D: An RWKV-Based Model with Multiple Training Strategies for Point Cloud AnalysisChenglong Sun, Shijie Pang, Yuzheng Wang, Lizhe Qi. 650-659 [doi]
- Adaspeaker: Learning Discriminative Speaker Representations with Gradient-Aware Adaptive ScalingJinghan Liu, Xingmei Wang 0002, Jiaxiang Meng. 660-668 [doi]
- Beyond Sparse Keypoints: Dense Pose Modeling for Robust Gait RecognitionWenpeng Lang, Saihui Hou, Yongzhen Huang. 669-678 [doi]
- From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-trainingJinwen Wang, Youfang Lin, Xiaobo Hu, Siyu Yang, Sheng Han, Shuo Wang 0031, Kai Lv 0002. 679-688 [doi]
- MuCodec: Ultra Low-Bitrate Music Codec for Music GenerationYaoxun Xu, Hangting Chen, Jianwei Yu 0001, Wei Tan 0011, Shun Lei, Zhiwei Lin, Rongzhi Gu, Zhiyong Wu 0001. 689-698 [doi]
- TriGS: Tri-consistency 3D Gaussian Splatting from Sparse and Unposed ViewsChi Huang, Qi Zhang 0071, Qian Zhang 0051, Nan Li 0048, Yipu Gong, Xiaowei Wang, Wei Feng 0005. 699-708 [doi]
- High-Performance Discriminative Tracking with Spatio-Temporal Template FusionXuedong He, Huiying Xu, Xinzhong Zhu, Hongbo Li. 709-718 [doi]
- Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense PredictionsJingdong Zhang, Hanrong Ye, Xin Li 0003, Wenping Wang 0001, Dan Xu 0002. 719-728 [doi]
- MIPS: A Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property PredictionJiaxi Wang, Yaosen Min, Xun Zhu, Miao Li 0003, Ji Wu. 729-738 [doi]
- Cause and Effect: Video Social Relationship Recognition from Causal PerspectiveYuxuan Zhang, Bo Wang, Yu Du, Yangfu Zhu, Haorui Wang, Guangyao Su, Tao Zhou, Bin Wu. 739-747 [doi]
- A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing TaskMashiro Toyooka, Kiyoharu Aizawa, Yoko Yamakata. 748-756 [doi]
- From Pixels to Semantics: A Novel MLLM-Driven Approach for Explainable Tampered Text DetectionGuitao Xu, Ziqi Yi, Peirong Zhang, Jiahuan Cao, Shihang Wu, Lianwen Jin. 757-766 [doi]
- Deep Graph Clustering with Disentangled Representation LearningYifan Wang 0014, Yuntai Ding, Yiyang Gu, Ziyue Qiao, Chong Chen, Xian-Sheng Hua 0001, Ming Zhang 0004, Wei Ju 0001. 767-776 [doi]
- RATopo: Improving Lane Topology Reasoning via Redundancy AssignmentHan Li, Shaofei Huang 0001, Longfei Xu, YuLu Gao, Beipeng Mu, Si Liu 0001. 777-786 [doi]
- BiOMamba: Mamba-based Forward-Then-Backward Temporal Modeling for Online Action Detection and AnticipationSensen Wang, Yuehu Liu, Chi Zhang. 787-795 [doi]
- Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object SegmentationXiangyu Zheng, Songcheng He, Wanyun Li, Xiaoqiang Li 0002, Wei Zhang. 796-805 [doi]
- Camera-Specific Imaging Simulation for Raw Domain Image Super ResolutionXiaobo Liu, Henglu Wei, Chuxi Yang, Wei Yu, Xudong Zhao, Xiangyang Ji. 806-815 [doi]
- PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image GenerationZongsheng Cao, Yangfan He, Anran Liu, Jun Xie, Zhepeng Wang 0002, Feng Chen. 816-825 [doi]
- FG-Midiformer: A Symbolic Music Understanding Model towards Fine-Grained Learning of Multi-AttributesHaonan Cheng, Junwei Zhang, Hengyan Huang, Long Ye. 826-835 [doi]
- VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question AnsweringYiran Meng, Junhong Ye, Wei Zhou 0021, Guanghui Yue 0001, Xudong Mao, Ruomei Wang 0001, Baoquan Zhao. 836-845 [doi]
- Towards Fine-Grained Human Motion Video CaptioningGuorui Song, Guocun Wang, Zhe Huang, Jing Lin, Xuefei Zhe, Jian Li, Haoqian Wang. 846-855 [doi]
- Learning the Anchors with Similar Distributions to Original Data for Multi-view ClusteringJunpu Zhang, Shengju Yu, Suyuan Liu, Siwei Wang 0001, Miaomiao Li 0001, Xinwang Liu 0002, En Zhu, Kunlun He. 857-866 [doi]
- Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating AssessmentFengshun Wang, Qiurui Wang, Peilin Zhao. 867-875 [doi]
- Gather and Trace: Rethinking Video TextVQA from an Instance-oriented PerspectiveYan Zhang, Gangyan Zeng, Daiqing Wu, Huawen Shen, Binbin Li, Yu Zhou 0015, Can Ma, Xiaojun Bi 0002. 876-885 [doi]
- Selective Shift: Towards Personalized Domain Adaptation in Multi-Agent Collaborative PerceptionHui Zhang 0091, Yiteng Xu, Yonglin Tian, Yidong Li, Tiago H. Falk, Fei-Yue Wang. 886-895 [doi]
- Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object DetectionMingqian Ji, Jian Yang, Shanshan Zhang 0001. 896-904 [doi]
- FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice EnhancingGaoxiang Cong 0001, Liang Li 0003, Jiadong Pan, Zhedong Zhang, Amin Beheshti, Anton van den Hengel, Yuankai Qi, Qingming Huang. 905-914 [doi]
- DUIMC: Deep Unbalanced Incomplete Multi-View Clustering via Graph Constrained Imputation and Contrastive LearningWenhui Wu 0001, Guanqi Wen, Le Ou-Yang, Ran Wang 0001, Sam Kwong. 915-924 [doi]
- EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty SamplerHao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, Shanghang Zhang. 925-934 [doi]
- Large-Small Model Synergy with Multimodal Fine-Grained Heuristics for Knowledge-Based Visual Question AnsweringZhongfan Sun, Kan Guo, Yongli Hu, Daxin Tian, Qingqing Gao, Jiapu Wang, Junbin Gao, Yanfeng Sun, Baocai Yin. 935-944 [doi]
- MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D GaussiansPeng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, Ming Lu. 945-954 [doi]
- DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion RecognitionPeiyuan Jiang, Yao Liu, Qiao Liu 0003, Zongshun Zhang, Jiaye Yang, Lu Liu, Daibing Yao. 955-964 [doi]
- Accelerating Long Video Understanding via Compressed Scene Graph-Enabled Chain-of-ThoughtTao Ling, Siping Shi, Dan Wang 0002. 965-974 [doi]
- BadMDA: Towards Backdoor Injection during Domain Adaptation to Collapse Multi-Agent PerceptionTong Chen, Bowen Du 0001, Jiejie Zhao, Hanyang Xia, Haiquan Wang, Jiakai Wang. 975-983 [doi]
- Epipolar Consistency-based Network for Structure-Aware LF Semantic SegmentationChen Gao, Youfang Lin, Wenbin Wang, Shuo Zhang 0003. 984-992 [doi]
- Single Domain Generalization for Multimodal Cross-Cancer Prognosis via Dirac Rebalancer and Distribution EntanglementJia-Xuan Jiang, Jiashuai Liu, Hongtao Wu, Yifeng Wu, Zhong Wang 0006, Qi Bi, Yefeng Zheng 0001. 993-1002 [doi]
- StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with Implicit Neural RepresentationYi Liu, Xinyi Liu 0002, Yi Wan 0001, Panwang Xia, Qiong Wu, Yongjun Zhang 0002. 1003-1012 [doi]
- LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object DetectionLanhu Wu, Zilin Gao, Hao Fei 0001, Mong-Li Lee, Wynne Hsu. 1013-1022 [doi]
- HGCF: Hierarchical Geometry-Color Fusion for Multimodal Industrial Anomaly DetectionMin Li 0033, Jinghui He, Jiachen Li, Delong Han, Jin Wan, Gang Li 0005. 1023-1031 [doi]
- Outlier-Aware Model Merging for Efficient Multitask InferenceQiyuan Zhu, Lujun Li 0001, Dezhi Li, Jiacheng Liu, Pengyu Cheng, Yucheng Xu, Sirui Han, Yike Guo. 1032-1041 [doi]
- A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual GroundingZhenyang Liu, Sixiao Zheng, Siyu Chen, Cairong Zhao, Longfei Liang, Xiangyang Xue 0001, Yanwei Fu 0001. 1042-1051 [doi]
- Rethinking Diffusion Bridge Model with Dual Alignments for Medical Image SynthesisJinbao Wei, Yuhang Chen, Zhijie Wang, Gang Yang, Shimin Tao, Jian Gao, Aiping Liu, Xun Chen 0001. 1052-1061 [doi]
- CDIB: Consistency Discovery-guided Information Bottleneck for Multi-modal Knowledge Graph ReasoningHaichuan Fang, Haoran Zhang, Yulin Du, Qiang Guo 0012, Zhen Tian, Youwei Wang, Yangdong Ye. 1062-1071 [doi]
- Flexible Multi-view Clustering with Dynamic Views GenerationYalan Qin, Nan Pu, Hanzhou Wu, Zhaoxin Fan. 1072-1081 [doi]
- Residual Prior-driven Frequency-aware Network for Image FusionZheng Guan, Xue Wang 0011, Wenhua Qian, Peng Liu, Runzhuo Ma. 1082-1091 [doi]
- Clustering-Oriented Generative Attribute Graph ImputationMulin Chen, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li 0001. 1092-1101 [doi]
- DPFMVC: Dynamic Progressive Fusion for Multi-view ClusteringTaichun Zhou, Zhibin Dong, Siwei Wang 0001, Ke Liang 0006, Miaomiao Li 0001, Xinwang Liu 0002, En Zhu, Xiangjun Dong 0001. 1102-1111 [doi]
- Discrepancy-Aware Attention Network for Enhanced Audio-Visual Generalized Zero-Shot LearningRunlin Yu, Yipu Gong, Wenrui Li 0001, Aiwen Sun, Mengren Zheng. 1112-1121 [doi]
- Unsupervised Cross-view Message Passing Method for Multi-view Graph ClusteringZiming Quan, Penglei Wang, Danyang Wu, Jin Xu. 1122-1131 [doi]
- X: Generalizable Dynamic Removal for NeRF and Gaussian Splatting SLAMMingrui Li, Dong Li, Sijia Hu, Kangxu Wang, Zhenjun Zhao, Hongyu Wang. 1132-1140 [doi]
- Prior-oriented Anchor Learning with Coalesced Semantics for Multi-View ClusteringJinjia Peng, Tianhang Cheng, Guangqi Jiang, Huibing Wang. 1141-1150 [doi]
- CrosST: Cross Swin 4D Transformer for Multi-Modal Alzheimer's DetectionHao Wang, Hanxiao Li, Li Xu. 1151-1160 [doi]
- Domain-Specific Interactive Prompting for Generalized Nuclei ClassificationBinbin Zheng, Aiqiu Wu, Kai Fan, Ao Li, Minghui Wang. 1161-1170 [doi]
- Positional Prompt Tuning for Efficient 3D Representation LearningShaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei. 1171-1180 [doi]
- Trusted Open-World Multi-View Classification with Dynamic Opinion AggregationZhicheng Dong 0001, Xiaodong Yue, Yufei Chen 0002, Yuxian Zhou. 1181-1189 [doi]
- Towards Universal Perception through Language-Guided Open-World Object DetectionZihan Wang, Yunhang Shen, Yuan Fang, Zuwei Long, Ke Li, Xing Sun 0001, Jiao Xie, Shaohui Lin. 1190-1199 [doi]
- Scalable Unpaired Multi-View Clustering via Anchor-Driven High-Throughput EncodingJunyu Chen, Jiawei Peng, Yuan Sun 0016, Jian Dai, Xingfeng Li, Zhenwen Ren. 1200-1209 [doi]
- Modal Symbiosis: Variational Alignment Unveils New Horizons in Multimodal Representation LearningZeyan Li, Cankun Guo, Yin Tang. 1210-1219 [doi]
- Enhancing Multi-view Open-set Learning via Ambiguity Uncertainty Calibration and View-wise DebiasingZihan Fang, Zhiyong Xu, Lan Du, Shide Du, Zhiling Cai, Shiping Wang. 1220-1228 [doi]
- Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and BenchmarkingZhangyong Tang, Tianyang Xu 0001, Xuefeng Zhu 0003, Chunyang Cheng, Tao Zhou 0002, Xiaojun Wu, Josef Kittler. 1229-1238 [doi]
- Deep Multi-Level Contrastive Clustering for Multi-Modal Remote Sensing ImagesWeiqi Liu, Yongshan Zhang, Xinxin Wang 0003, Lefei Zhang. 1239-1247 [doi]
- PREMISE: Individual Preference-aware Multi-modal Cooperation for Survival PredictionJiaqi Cui, Yilun Li, Xi Wu 0004, Jiliu Zhou, Yan Wang 0015. 1248-1257 [doi]
- BridgeGLM: Bridging Graph and Language Spaces for Domain GeneralizationJiaxing Qi, Yifan Xu, Zhifei Yang 0004, Ruifei Ma, Chao Zhang, Kuifei Yu. 1258-1267 [doi]
- Toward a Training-Free Plug-and-Play Refinement Framework for Infrared and Visible Image Registration and FusionYating Liu, Yang Zou 0004, Xingyuan Li 0005, Xingyue Zhu, Kaiqi Han, Zhiying Jiang, Long Mau 0002, Jinyuan Liu 0001. 1268-1277 [doi]
- Beyond Equal Views: Strength-Adaptive Evidential Multi-View LearningCai Xu, Ziqi Wen, Jie Zhao, Wanqing Zhao, Jinlong Yu, HaiShun Chen, Ziyu Guan, Wei Zhao 0019. 1278-1287 [doi]
- RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual DataYoorhim Cho, Hongyeob Kim, Semin Kim, Youjia Zhang, Yunseok Choi, Sungeun Hong. 1288-1297 [doi]
- CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ SegmentationXinlei Yu, Changmiao Wang, Hui Jin, Ahmed Elazab, Gangyong Jia, Xiang Wan, Changqing Zou, Ruiquan Ge. 1298-1307 [doi]
- StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic SegmentationBingyu Li, Da Zhang 0010, Zhiyuan Zhao 0005, Junyu Gao 0001, Xuelong Li 0001. 1308-1317 [doi]
- Dual-Learning based Penalized Multi-Align Clustering for Multi-View Incomplete and Disorderly DataLiang Zhao, Shubin Ma, Bo Xu 0008, Qingchen Zhang. 1318-1326 [doi]
- HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery DetectionJialei Cui, Jianwei Du, Yanzhe Li, Lei Gao, Hui Jiang, Chenfu Bao. 1327-1336 [doi]
- Geometric Gradient Divergence Modulation for Imbalanced Multimodal LearningDisen Hu, Xun Jiang 0001, Zhe Sun 0009, Hao Yang, Chong Peng, Peng Yan, Heng Tao Shen, Xing Xu 0001. 1337-1345 [doi]
- Ear with Eye: Lightweight Multimodal Audio-Visual Network Inspired by Bionic StructuresXuanming Jiang, Baoyi An 0001, Zhengwei Zou, Dingyu Nie, Jialie Shen 0001, Xueming Qian, Guoshuai Zhao. 1346-1355 [doi]
- Physics-Guided Sonar Image Fine-grained Recognition under Scarce AnnotationsChengzhou Li, Xiaokang Liu, Qi Jia 0001, Jinyuan Liu 0001, Zhiying Jiang, Longhan Feng, Yu Liu 0012, Zhongxuan Luo, Xin Fan. 1356-1365 [doi]
- Conflict-Buffering Optimization by Symmetry Teleportation for Deep Long-Tailed RecognitionMianzimei Yang, Zhipeng Zhou, Jin Zhang 0035, Yuanhao Pu, Hong Xie 0004, Defu Lian. 1366-1375 [doi]
- 3T: Feature-Aware Adversarial Attacks for Multi-modal TrackingJiahao Wang 0002, Fang Liu 0001, Licheng Jiao, Hao Wang 0211, Shuo Li 0010, Lingling Li 0002, Puhua Chen, Xu Liu 0006, Xinyi Wang. 1376-1385 [doi]
- PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric FusionZhiwei Zhang 0005, Ruikai Xu, Weijian Zhang, Zhizhong Zhang 0001, Xin Tan 0002, Jingyu Gong, Yuan Xie 0006, Lizhuang Ma. 1386-1394 [doi]
- HAFUNet: A Hierarchical Attention Fusion Network for Monocular Depth Estimation Integrating Event and Frame DataSiyuan Zhang, Xiaoping Wang 0001, Jiang Li 0004, Weibin Feng, Xin Zhan, Hongzhi Huang. 1395-1403 [doi]
- A Motion is Worth a Hybrid Sentence: Taming Language Model for Unified Motion Generation by Fine-grained PlanningRonghui Li, Lingxiao Han, Shi Shu, Yueyao Liu, Yukang Lin, Yue Ma, Jie Guo, Ziwei Liu, Xiu Li 0001. 1404-1413 [doi]
- Scalable One-step Unaligned Multi-view Clustering via Joint High-Order Correlation LearningHongyu Jiang, Yuxin Huo, Sirou Sheng, Hong Tao, Chenping Hou. 1414-1422 [doi]
- Breaking Semantic Barriers: A Zero-Shot Generalized Framework for Graph Anomaly DetectionXiangping Zheng, Xuan Feng, Bo Wu 0026, Bin Ren, Wei Li 0109, Xiuxin Hao, Xun Liang 0001, Bin Tang, Zhiwen Yu 0001. 1423-1432 [doi]
- Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous DrivingMi Zheng, Guanglei Yang, Zitong Huang, Zhenhua Guo 0001, Kevin Han, Wangmeng Zuo. 1433-1442 [doi]
- Infrared and Visible Image Fusion with Language-Driven Loss in CLIP Embedding SpaceYuhao Wang, Lingjuan Miao, Zhiqiang Zhou 0001, Lei Zhang, Yajun Qiao. 1443-1451 [doi]
- DDFD: Diffusion-Based Denoising Fusion for Object Detection in Infrared-Visible ImagesMin Dang, Gang Liu 0006, Jingqi Zhao, Adams Wai-Kin Kong, Nan Luo, Di Wang 0011. 1452-1461 [doi]
- CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared DetectorsJiahuan Long, Wen Yao, Tingsong Jiang, Jiacheng Hou, Shuai Jia, Junqi Wu 0002, Xiaoya Zhang, Xiaohu Zheng, Chao Ma 0004. 1462-1470 [doi]
- Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting VerificationPeirong Zhang, Kai Ding 0009, Lianwen Jin. 1471-1479 [doi]
- Robust Multi-view Clustering via Pseudo Label Guided Universum LearningZhenxi Wang, Zongyao Yin, Yujie Hou, Xianchuan Yu. 1480-1489 [doi]
- Multimodal Dual Population Evolutionary Reinforcement LearningYao Zhang, Ping Huang, Rui Zhang. 1490-1499 [doi]
- Bridging the Unseen Gap: Label-Enhanced Information Bottleneck Distillation for Multimodal Named Entity RecognitionBo Xu 0023, Jie Wei, Hongya Wang, Ming Du 0002, Hui Song, Yanghua Xiao. 1500-1509 [doi]
- Exploring Multimodal Prompts For Unsupervised Continuous Anomaly DetectionMingle Zhou, Jiahui Liu, Jin Wan, Gang Li 0005, Min Li 0033. 1510-1519 [doi]
- BrainSegDMIF: A Dynamic Fusion-enhanced SAM for Brain Lesion SegmentationHongming Wang, Yifeng Wu, Huimin Huang, Hongtao Wu, Jiaxuan Jiang, Xiaodong Zhang, Hao Zheng, Yawen Huang, Xian Wu 0001, Yefeng Zheng 0001, Jinping Xu, Jing Cheng. 1520-1529 [doi]
- Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud DetectionTairan Huang, Yili Wang 0005, Qiutong Li, Changlong He, Jianliang Gao. 1530-1538 [doi]
- Signal-SGN: A Spiking Graph Convolutional Network for Skeleton Action Recognition via Learning Temporal-Frequency DynamicsNaichuan Zheng, Yuchen Du, Hailun Xia, Zeyu Liang. 1539-1548 [doi]
- Art4Math: Handwritten Mathematical Expression Recognition via Multimodal Sketch GroundingYang Zhou, Jin Wang, Yuxiao Zhang, Kaixiang Huang, Guodong Lu, Jingru Yang, Shengfeng He. 1549-1558 [doi]
- Frequency-refined Graph Convolution Network with Cross-modal Wavelet Denoising for RecommendationFeiyu Peng, Chaobo He, Junwei Cheng, Huijuan Hu, Wenkai Zhang, Youda Mo. 1559-1568 [doi]
- 2-SR: A Dual-Consistency Guided Curriculum Learning method for Thick-Slice Fetal MRI Super-ResolutionChuan Zeng, Zhao Zhang, Wei Huang, Lei Zhang 0005, Le Yi, Kefu Zhao. 1569-1578 [doi]
- BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly DetectionAn Xiang, Zixuan Huang, Xitong Gao, Kejiang Ye, Cheng-Zhong Xu 0001. 1579-1587 [doi]
- MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge DistillationHui Li, Pengfei Yang, Juanyang Chen, Le Dong, Yanxin Chen, Quan Wang 0006. 1588-1597 [doi]
- CausalMVC: Causal Content-Style Representation Learning for Deep Multi-View ClusteringShifeng Bao, Zhe Xue, Qi Chen 0014, Shilong Ou, Amin Beheshti, Quan Z. Sheng, Anton van den Hengel, Yuankai Qi. 1598-1606 [doi]
- SpecSolver: Solving Spatial-Spectral Fusion via Semantic TransformerWei Li 0034, Junwei Zhu, Honghui Xu 0002, Jiawei Jiang 0002, Jianwei Zheng 0001. 1607-1616 [doi]
- Arbitrary-scale Fusion Neural OperatorJunwei Zhu, Wei Li 0034, Honghui Xu 0002, Jiawei Jiang 0002, Zhi Liu, Jianwei Zheng 0001. 1617-1626 [doi]
- 2HDiffuser: Image Illumination Harmonization Meets the Diffusion ModelZhongyun Bao, Gang Fu 0003, Jianchi Sun, Jing Zhou, Ziqi Yu, Chunxia Xiao. 1627-1636 [doi]
- Visual Grounding with Attention-Driven Constraint BalancingWeitai Kang, Luowei Zhou, Junyi Wu 0002, Changchang Sun, Yan Yan 0002. 1637-1645 [doi]
- Rule Meets Learning: Confidence-Aware Multi-View Fusion for Self-Supervised 3D Hand Pose EstimationPengfei Ren 0001, Jingyu Wang 0001, Haifeng Sun 0001, Qi Qi 0001, Jing Wang, Jianxin Liao. 1646-1655 [doi]
- Prior-Constrained Relevant Feature driven Image Fusion with Hybrid Feature via Mode DecompositionBingfeng Liu, Songwei Pei, Shuhuai Wang, Wenzheng Yang, Qian Li, Shangguang Wang. 1656-1665 [doi]
- Regularizing Subspace Redundancy of Low-Rank AdaptationYue Zhu, Haiwen Diao, Shang Gao 0012, Jiazuo Yu 0001, Jiawen Zhu 0003, Yunzhi Zhuge, Shuai Hao 0007, Xu Jia 0012, Lu Zhang 0053, Ying Zhang 0021, Huchuan Lu. 1666-1675 [doi]
- Anchors Bring Stability and Efficiency: Fast Tensorial Multi-view Clustering on Shuffled DatasetsJintian Ji, Songhe Feng. 1676-1685 [doi]
- Energy-based Deep Incomplete Multi-View ClusteringZiyu Wang, Yiming Du, Rui Ning, Lusi Li. 1686-1694 [doi]
- Neighbor Contrastive Learning with Weakened Consensus Graph for Deep Multi-View ClusteringKai Zhu, Jun Yin. 1695-1703 [doi]
- Try Harder: Hard Sample Generation and Learning for Cloth-Changing Person Re-IDHankun Liu, Yujian Zhao, Guanglin Niu. 1704-1713 [doi]
- LargeMvC-Net: Anchor-based Deep Unfolding Network for Large-scale Multi-view ClusteringShide Du, Chunming Wu, Zihan Fang, WenDi Zhao, Yilin Wu, Changwei Wang 0001, Shiping Wang. 1714-1723 [doi]
- Cycle-Consistent Mamba-Based Registration-Fusion Joint Network for Unregistered Hyperspectral Image Super-ResolutionQuangui He, Jiahui Qu, Wenqian Dong, Song Xiao 0001, Qinghao Gao. 1724-1733 [doi]
- Event Consistency-aware Robust Fake News DetectionLiyuan Cao, Zihang Guo, Huaiwen Zhang. 1734-1743 [doi]
- Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence TreeQi Peng 0002, Jialin Cui, Jiayuan Xie, Yi Cai 0001, Qing Li 0001. 1744-1753 [doi]
- From Model Diagram to Code: A Benchmark Dataset and Multi-Agent FrameworkMengzhen Wang, Xunbin Huang, Jiayuan Xie, Shukai Ma, Jiale Men, Dayong Liang, Yi Cai 0001. 1754-1763 [doi]
- TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal InputsZiqiang Shi, Rujie Liu, Jun Takahashi, Shan Jiang. 1764-1773 [doi]
- Radar-Mamba: 4D Millimeter-Wave Point Cloud Enhancement via State Space ModelsHong Gao, Xiangkai Xu, Tianqi Zhu, Xiugang Dong, Yiming Bao, Min-Ling Zhang. 1774-1782 [doi]
- MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Static QuantizationJiangyong Yu, Sifan Zhou, Dawei Yang, Shuoyu Li, Shuo Wang, Xing Hu 0010, Chen Xu, Zukang Xu, Changyong Shu, Zhihang Yuan. 1783-1792 [doi]
- KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News DetectionPeican Zhu, Yubo Jing, Le Cheng, Keke Tang, Yangming Guo. 1793-1801 [doi]
- Leader is Guided: Interactive Motion Generation via Lead-Follow Paradigm and Trajectory GuidanceRunqi Wang, Caoyuan Ma, Jian Zhao 0013, Hanrui Xu, Dongfang Sun, Haoyang Chen, Lin Xiong, Zheng Wang 0007, Xuelong Li 0001. 1802-1811 [doi]
- DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D ReconstructionXuesong Li 0001, Jinguang Tong, Jie Hong, Vivien Rolland, Lars Petersson. 1812-1821 [doi]
- Tensor-based Opposing yet Complementary Learning for Multi-view Multi-label Feature SelectionPingting Hao, Huijie Zhang, Yongshan Zhang. 1822-1831 [doi]
- LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural CracksHui Liu, Chen Jia, Fan Shi 0001, Xu Cheng 0003, Mengfei Shi, Xia Xie 0003, Shengyong Chen. 1832-1841 [doi]
- Implicit Retinex Decomposition with Chromaticity Disentanglement for Low-Light Image EnhancementMufan Liu, Wu Ran, Zhiquan He, Zuojie Xie, Hong Lu 0001, Peirong Ma. 1842-1851 [doi]
- Multi-modal Prototype Guided Few-shot Object DetectionChenbo Zhang, Bing Huangfu, Hongxu Ma, Jihong Guan, Shuigeng Zhou. 1852-1861 [doi]
- FAMRD: Frequency-Aware Multimodal Reverse Distillation for Industrial Anomaly DetectionQiyin Zhong, Xianglin Qiu, Xiaolei Wang, Zhen Zhang, Gang Liu, Jimin Xiao. 1862-1871 [doi]
- Tractography-Guided Dual-Label Collaborative Learning for Multi-Modal Cranial Nerves ParcellationLei Xie 0001, Junxiong Huang, Yuanjing Feng, Qingrun Zeng. 1872-1879 [doi]
- Boosting Multi-Modal Alignment: Geometric Feature Separation for Class Incremental LearningGuoqiang Liang 0001, Chuan Qin, De Cheng, Shizhou Zhang, Yanning Zhang 0001. 1880-1889 [doi]
- Freq-RWKV: Granularity-Aware Spatial-Frequency Synergy via Dual-Domain Recurrent Scanning for Pan-sharpeningXueheng Li, Xuanhua He, Tao Hu 0027, Jie Zhang 0033, Man Zhou 0003, Chengjun Xie, Yingying Wang 0005, Bo Huang 0001. 1890-1899 [doi]
- Discovering Maximum Frequency Consensus: Lightweight Federated Learning for Medical Image SegmentationLingren Wang, Wenxuan Tu, Jieren Cheng, Jianan Wang, Xiangyan Tang, Chenchen Wang. 1900-1909 [doi]
- Dual Teacher with Dempster-Shafer Guidance for Decision Making in Semi-Supervised Small Object DetectionNan Gao, Junchao Zhu, Yilong Zhang 0001, Ronghua Liang, Guodao Sun, Peng Chen. 1910-1919 [doi]
- Kinematic Enhanced Hypergraph Convolutional Network for Skeleton-based Human Action Recognition with LLM Training GuidesNan Ma 0008, Beining Sun, Yiheng Han, Genbao Xu. 1920-1928 [doi]
- Analytic Continual Test-Time Adaptation for Multi-Modality CorruptionYufei Zhang, Yicheng Xu, Hongxin Wei, Zhiping Lin 0001, Xiaofeng Zou, Cen Chen 0002, Huiping Zhuang. 1929-1937 [doi]
- TopoImages: Incorporating Local Topology Encoding into Deep Learning Models for Medical Image ClassificationPengfei Gu, Hongxiao Wang, Yejia Zhang, Huimin Li, Chaoli Wang 0001, Danny Chen 0001. 1938-1947 [doi]
- FreeCAD: A Multimodal Framework for 3D CAD Model Generation from Free-Form PromptsDawei Lin, Meng Yuan, Ziming Wang 0002, Tieru Wu, Yuanning Liu. 1948-1956 [doi]
- OIMGC-Net: Optimization-inspired Interpretable Multi-view Graph Clustering NetworkRenjie Lin, Jiacheng Li, Shide Du, Shiping Wang, Le Zhang 0001. 1957-1966 [doi]
- ElaSleepNet: Exploring an Elastic Multimodal Neural Network for Sleep Staging via Temporal and Contextual Consistency LearningQi Shen 0002, Junchang Xin, Bing Tian Dai, Shudi Zhang, Xinyao Liu, Zhiqiong Wang. 1967-1976 [doi]
- SALVG: Latent Variable Gene Augmented Graph Learning for Multi-View Clustering in Spatial TranscriptomicsZeyu Zhu, Ke Liang 0006, Lingyuan Meng, Xingchen Hu 0001, Xinwang Liu 0002, Wanwei Liu, Kunlun He. 1977-1986 [doi]
- Frequency Meets Semantics: Text-Visual Fusion with Directional Spectral Enhancement for Salient Object Detection in Optical Remote Sensing ImagesLamei Di, Bin Zhang, Yiming Wang, Wenxia Zhang. 1987-1996 [doi]
- Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment AnalysisMiaosen Luo, Yuncheng Jiang, Sijie Mai. 1997-2006 [doi]
- MMF-SV: A Multi-Modal Feature Fusion-Based Structural Variant CallerZeyu Xia, Canqun Yang, Haoang Chi, Tao Tang 0001, Weiming Xiang, Yingbo Cui. 2007-2015 [doi]
- Zero in on the Target: A Composite Robust Model for Retrieving Information in Traffic Data to Discover Network AttacksZiang Li, Chengxiang Si, Zhenyu Cheng 0001. 2016-2025 [doi]
- Amplitude-aware Domain Style Replay for Lifelong Person Re-identificationLong Chen, De Cheng, Shizhou Zhang, Yinghui Xing, Di Xu, Yanning Zhang 0001. 2026-2035 [doi]
- HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional ReconstructionJie Qin, Wei Yang 0034, Yan Su, Yiran Zhu, Weizhen Li, Yunyue Pan, Chengchang Pan, Honggang Qi. 2036-2043 [doi]
- Disentangling Homophily and Heterophily in Multimodal Graph ClusteringZhaochen Guo, Zhixiang Shen, Xuanting Xie, Liangjian Wen, Zhao Kang 0001. 2044-2053 [doi]
- AV-RISE: Hierarchical Cross-Modal Denoising for Learning Robust Audio-Visual Speech RepresentationZhishuo Zhao, Yi Lin 0006, Dongyue Guo, Junyu Fan. 2054-2063 [doi]
- Cross-Modal Retrieval with Cauchy-Schwarz DivergenceJiahao Zhang, Wenzhe Yin, Shujian Yu. 2064-2073 [doi]
- LFMamba: Focal Stack-aware State Space Modeling for Light Field Salient Object DetectionXinbo Geng, Fan Shi 0001, Xu Cheng 0003, Chen Jia, Meng Zhao 0001, Shengyong Chen. 2074-2083 [doi]
- WFF: Wavelet-based Information Fusion for Multimodal Knowledge Graph Link PredictionXiaodi Xu, Lijie Li, Ye Wang 0021, Tao Ren 0001, Tian Qiao. 2084-2093 [doi]
- Breaking the Spatial-Temporal Consistency Constraint: Towards Reference-Based Hyperspectral Image Super-ResolutionXuyao Liu, Jiahui Qu, Wenqian Dong. 2094-2103 [doi]
- Visual-informed Silent Video Identity ConversionYifan Liu, Yu Fang, Zhouhan Lin. 2104-2112 [doi]
- Dynamic Optimization Noisy Cross-Modal HashingZebing Yao, Hao Fu, Yuanhang Yang, Guanghua Gu. 2113-2121 [doi]
- Multi-view Hashing ClassificationYuhang Lan, Shilin Xu 0003, Chao Su 0003, Run Ye, Dezhong Peng, Yuan Sun 0016. 2122-2130 [doi]
- Where Views Meet Curves: Virtual Anchors for Hyperbolic Multi-View Graph DiffusionJielong Lu, Zhihao Wu, Jiajun Yu, Qianqian Shen, Jiajun Bu, Haishuai Wang. 2131-2140 [doi]
- DiffuSeg: Diffusion-Enhanced Cross-Modal Semantic Segmentation for RGB-DJun Yang 0056, Maoyu Mao. 2141-2149 [doi]
- DA3D: Domain-Aware Dynamic Adaptation for All-Weather Multimodal 3D DetectionHaochen Yang 0002, Lei Li, Jiacheng Guo, Baolu Li, Minghai Qin, Hongkai Yu, Tianyun Zhang. 2150-2158 [doi]
- CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training FrameworkWentao Wu, Xiao Wang 0014, Chenglong Li 0002, Bo Jiang 0002, Jin Tang 0001, Bin Luo 0001, Qi Liu 0003. 2159-2168 [doi]
- Multi-view Clustering Based on Probabilistic Tensor RegressionYichen Bao, Yuxuan Liu, Yu Duan, Jing Li, Quanxue Gao. 2169-2177 [doi]
- CalibWorkflow: A General MLLM-Guided Workflow for Centimeter-Level Cross-Sensor CalibrationXingchen Li, Wuyang Zhang, Guoliang You, Xiaomeng Chu, Wenhao Yu 0010, Yifan Duan, Yuxuan Xiao, Yanyong Zhang. 2178-2187 [doi]
- Towards Multi-Scenario Forecasting of Building Electricity Loads with Multimodal DataYongzheng Liu, Siru Zhong, Gefeng Luo, Weilin Ruan, Yuxuan Liang 0002. 2188-2196 [doi]
- Balanced Multiple Kernel Clustering with Discrete Partition Entropy Auto RegularizationYan Chen 0036, Bingbing Jiang 0001, Peng Zhou 0006, Lei Duan, Yuhua Qian, Liang Du 0003. 2197-2206 [doi]
- Robust Tensor Learning with Graph Diffusion for Scalable Multi-view Graph ClusteringJiale Zou, Yan Chen 0036, Bingbing Jiang 0001, Peng Zhou 0006, Liang Du 0003, Lei Duan, Yuhua Qian. 2207-2215 [doi]
- DyNAS-DDI: Dynamic Pairwise Architecture Search for Generalizable Drug-Drug Interaction LLMLinxin Xiao, Xin Wang 0019, Zeyang Zhang 0001, Yang Yao 0003, Wenwu Zhu 0001. 2216-2225 [doi]
- PLATO-TTA: Prototype-Guided Pseudo-Labeling and Adaptive Tuning for Multi-Modal Test-Time Adaptation of 3D SegmentationJianxiang Xie, Yao Wu, Yachao Zhang 0001, Xiaopei Zhang, Yuan Xie 0006, Yanyun Qu. 2226-2234 [doi]
- Context-aware Image-to-Music Generation via Bridging Modalities through Musical CaptionsShilin Liu, Kyohei Kamikawa, Keisuke Maeda, Takahiro Ogawa 0001, Miki Haseyama. 2235-2243 [doi]
- Federated Incomplete Multi-view Clustering with Individual Structure Preservation and Central Representation TensorizationYan Li, Xingchen Hu 0001, Jiyuan Liu 0003, Zhong Liu 0002. 2244-2253 [doi]
- Consistent and Invariant Generalization Learning for Short-video Misinformation DetectionHanghui Guo, Weijie Shi, Mengze Li 0001, Juncheng Li 0006, Hao Chen, Yue Cui 0001, Jiajie Xu 0001, Jia Zhu 0003, Jiawei Shen, Zhangze Chen, Sirui Han. 2254-2263 [doi]
- MAP: Parameter-Efficient Tuning for Referring Expression Comprehension via Multi-Modal Adaptive Positional EncodingRuilin Yao, Yi Rong, Tianyu Zou, Bo Zhang 0069, Jian Li 0062, Shengwu Xiong 0001, Shili Xiong. 2264-2273 [doi]
- HandCraft: Tactile-Informed Hand-Object Dynamics Capture and Realistic RenderingHongyang Lin, Kuixiang Shao, Peijun Xu, Zhuoyang Bu, Yuyang Jiao, Ziyuan Tang, Chenxi Xiao, Jingyi Yu 0002. 2274-2283 [doi]
- Physics-Coupled Frequency Dynamic Adaptation Network for Domain Generalized Underwater Object DetectionLinxuan Luo, Pan Mu, Cong Bai. 2284-2293 [doi]
- Multimodal Decomposed Distillation with Instance Alignment and Uncertainty Compensation for Thermal Object DetectionYanfeng Liu, Lefei Zhang. 2294-2303 [doi]
- Bi-Orthogonal Non-negative Tensor tri-Factorization for Tensorized Label LearningRui Wang, Yuxuan Liu, Guangyu Yang, Quanxue Gao, Cheng Deng 0002. 2304-2312 [doi]
- Multi-view Graph Clustering with Dual Structure Awareness for Remote Sensing DataXin Peng 0010, Bowen Liu, Renxiang Guan, Wenxuan Tu. 2313-2322 [doi]
- DeepMolTex: Deep Alignment of Molecular Graphs with Large Language Models via Mixture of Modality ExpertsMingliang Yan, Yanhua Yu, Ruochi Zhang, Zhiyuan Liu, Ruicheng Zhang, Yimeng Ren 0001, Kangkang Lu 0002, Zhiyong Huang 0010, Feng Luo, Zhen Cai. 2323-2332 [doi]
- DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait RecognitionXinzhu Li, Juepeng Zheng, Yikun Chen, Xudong Mao, Guanghui Yue 0001, Wei Zhou 0021, Chenlei Lv, Ruomei Wang 0001, Fan Zhou 0001, Baoquan Zhao. 2333-2341 [doi]
- Anatomical Region-Guided 3D PET/MR Tumor Segmentation via Medical RecordTianming Xu, Tiantian Guo, Youdan Feng, Zihan Chen, Qiaoyi Xue, Lingzhi Hu, Yuhang Shi. 2342-2351 [doi]
- A Language-Assisted Semantic-Aware Disentangled Method for Link Prediction on Heterogeneous GraphsRongqiang Fang, Yongqi Sun, Jidong Yuan, Hongbo Cao, Jinkun Dong. 2352-2361 [doi]
- PgM: Partitioner Guided Modal Learning FrameworkGuimin Hu, Yi Xin 0003, Lijie Hu, Zhihong Zhu, Hasti Seifi. 2362-2371 [doi]
- Label-Semantics-Guided Multi-View Multi-Label Learning via High-Order Semantic FusionKaixiang Wang 0001, Xiaojian Ding, Wanqi Yang, Ming Yang 0014. 2372-2380 [doi]
- UniMTR: Unified Recognition of Dual-style Traditional Mongolian Scripts via Contrastive Representation AlignmentChenyang Zhou 0003, Monghjaya Ha, Chao Tang, Licheng Wu. 2381-2389 [doi]
- Towards Measuring and Modeling Geometric Structures in Time Series Forecasting via Image ModalityMingyang Yu, Xiahui Guo, Peng Chen, Zhenkai Li, Yang Shu. 2390-2398 [doi]
- ESTJ: Enhancing Structured Tendency Judgment in Hybrid-Modal Table UnderstandingShu-Xun Yang, Xian-Ling Mao, Heyan Huang. 2399-2408 [doi]
- UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter TuningMaoxun Yuan, Bo Cui, Tianyi Zhao, Jiayi Wang, Shan Fu, Xue Yang, Xingxing Wei 0001. 2409-2418 [doi]
- M2PE-Diff: Music-to-Pose Encoder for Dance Video Generation Leveraging Latent Diffusion FrameworkNokap Tony Park. 2419-2428 [doi]
- A Theoretical Proof of Dynamic Multimodal Fusion Exacerbates Modality GreedyXiaorui Ding, Huan Ma 0006, Changqing Zhang 0002. 2429-2436 [doi]
- Court of LLMs: Evidence-Augmented Generation via Multi-LLM Collaboration for Text-Attributed Graph Anomaly DetectionYiming Xu 0001, Jiarun Chen, Zhen Peng 0005, Zihan Chen, Qika Lin, Lan Ma, Bin Shi, Bo Dong 0001. 2437-2446 [doi]
- Find True Collaborators: Banzhaf Index-based Cross View Alignment for Partially View-aligned ClusteringShanghui Deng, Xiao Zheng, Chang Tang, Kun Sun, Yuanyuan Liu 0004, Xinwang Liu 0002. 2447-2456 [doi]
- Deep Variational Incomplete Multi-View Clustering with Information-Theoretic GuidanceWenlan Chen, Lu Gao, Cheng Liang 0001, Fei Guo 0001. 2457-2466 [doi]
- Evidential Remote Physiological Measurement via Uncertainty-aware Fusion of Video and RFJieyi Ge, Zhaodong Sun, Wei Peng 0009, Chenhang Ying, Yuwei Chen, Kui Ren 0001, Xiaobai Li. 2467-2475 [doi]
- Dual-Level Distribution Alignment for Deep Incomplete Multi-View ClusteringFujian Ren, Wenlan Chen, Lu Gao, Fei Guo 0001, Cheng Liang 0001. 2476-2485 [doi]
- Entity Graph Alignment and Visual Reasoning for Multimodal Fake News DetectionGuoyi Li, Die Hu 0004, Xiaomeng Fu, Qirui Tang, Yulei Wu, Xiaodan Zhang 0004, Honglei Lyu. 2486-2495 [doi]
- Visual-Enhanced Multimodal Framework for Flexible Job Shop Scheduling ProblemPeng Zhao 0018, Zhiguang Cao, Di Wang 0004, Wen Song 0004, Wei Pang 0001, You Zhou 0008, Yuan Jiang 0007. 2496-2505 [doi]
- Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph ReasoningYu Zhao 0043, Ying Zhang 0015, Xuhui Sui, Baohang Zhou, Haoze Zhu, Jeff Z. Pan, Xiaojie Yuan. 2506-2515 [doi]
- CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMsJianting Tang, Yubo Wang 0010, Haoyu Cao 0001, Linli Xu. 2516-2525 [doi]
- Contextually-Guided State Space Fusion for Misaligned Multi-Spectral Object DetectionGuyue Jin, Tianming Zhao 0003, Jiacan Yan, Tian Tian 0006. 2526-2535 [doi]
- Graph Canvas for Controllable 3D Scene GenerationLibin Liu, Shen Chen, Sen Jia 0003, Jingzhe Shi, Can Jin, Zongkai Wu, Jenq-Neng Hwang, Lei Li 0050. 2536-2545 [doi]
- MM-HSD: Multi-Modal Hate Speech Detection in VideosBerta Céspedes-Sarrias, Carlos Collado-Capell, Pablo Rodenas-Ruiz, Olena Hrynenko, Andrea Cavallaro. 2546-2555 [doi]
- Joint Test-time Adaptation with Refined Pseudo-labels and Latent Score MatchingYijie Yang, Lianyong Qi, Weiming Liu 0005, Fan Wang 0020, Jing Du, Yuwen Liu 0003, Xiaolong Xu 0001, Qiang Ni, Wanchun Dou, Xiaokang Zhou. 2556-2565 [doi]
- CLIP-6D: Empowering CLIP as a Zero-Shot 6D Pose Estimator Through Generalizable Object-Specific RepresentationsHua Wang, Hong Liu 0008, Jiale Ren, Mingxin Tan, Zhongzien Jiang. 2566-2575 [doi]
- AeroDuo: Aerial Duo for UAV-based Vision and Language NavigationRuipu Wu, Yige Zhang, Jinyu Chen, Linjiang Huang, Shifeng Zhang, Xu Zhou, Liang Wang 0001, Si Liu 0001. 2576-2585 [doi]
- EventVAD: Training-Free Event-Aware Video Anomaly DetectionYihua Shao, Haojin He, Sijie Li, Siyu Chen, Xinwei Long, Fanhu Zeng, Yuxuan Fan, Muyang Zhang, Ziyang Yan, Ao Ma, Xiaochen Wang, Hao Tang 0005, Yan Wang 0105, Shuyan Li. 2586-2595 [doi]
- SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object DetectionQiuyu Liang, Yongqiang Zhang. 2596-2605 [doi]
- CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual RationaleXiao Liang, Jiawei Hu, Di Wang 0011, Zhi Ma, Lin Zhao 0003, Ronghan Li, Bo Wan 0002, Quan Wang 0006. 2606-2615 [doi]
- Temporal-coded Spiking TransformerQian Sun 0014, Chengzhuo Lu, Wenyu Chen 0001, Wenjie Wei, Jingya Wang, Jieyuan Zhang, Xiaoli Liu, Yalan Ye, Yang Yang 0060, Malu Zhang. 2616-2624 [doi]
- Domain-aware Visual Context Prompt for Multi-Source Domain AdaptationYuwu Lu, Haoyu Huang, Xue Hu. 2625-2633 [doi]
- CEARI: Co-Evolutionary Agents for Reassembling and Inpainting Puzzles with Gaps and Missing PiecesXingke Song, Jianxu Shangguan, Yiran Li 0003, Jialu Zhang, Jianfeng Ren, Ruibin Bai, Xin Chen, Xudong Jiang 0001. 2634-2642 [doi]
- Hierarchical Meta-prototypes Network for Few-shot Action RecognitionXiaoyu Chen, Yigang Cen, Wanru Xu, Yue Zhang 0065, Yi Jin 0001, Yidong Li, Linna Zhang. 2643-2652 [doi]
- Domain Crossover Non-Rigid Registration for 3D Human MeshesKyungjune Lee, Seongjean Kim, Hoseok Tong, Hyucksang Lee, Seongmin Lee 0002, Weisi Lin, Ping An, Sanghoon Lee 0001. 2653-2662 [doi]
- Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets SelectionJingyao Wang, Yiming Chen, Lingyu Si, Changwen Zheng. 2663-2672 [doi]
- OoDDINO: A Multi-level Framework for Anomaly Segmentation on Complex Road ScenesYuxing Liu, Ji Zhang, Xuchuan Zhou, Jingzhong Xiao, Huimin Yang, Jiaxin Zhong. 2673-2682 [doi]
- SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image CaptioningSi-Woo Kim, MinJu Jeon, Ye Chan Kim, Soeun Lee, Taewhan Kim, Dong Jin Kim. 2683-2692 [doi]
- Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation DataXun Zhu, Fanbin Mo, Zheng Zhang, Jiaxi Wang, Yiming Shi, Ming Wu 0001, Chuang Zhang, Miao Li 0003, Ji Wu. 2693-2702 [doi]
- DSS-Prompt: Dynamic-Static Synergistic Prompting for Few-Shot Class-Incremental LearningLinpu He, Yanan Li 0002, Bingze Li, Elvis Han Cui, Donghui Wang. 2703-2712 [doi]
- Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive ReasoningYian Li, Wentao Tian, Yang Jiao, Tianwen Qian, Na Zhao, Bin Zhu 0006, Jingjing Chen 0001, Yu-Gang Jiang 0001. 2713-2722 [doi]
- Learning Hierarchical Cross-modal Association with Intra-modal Context for Text-Image Person RetrievalYifei Deng, Chenglong Li 0002, Futian Wang, Jin Tang 0001. 2723-2731 [doi]
- SGM-Transformer: Rethinking Gradient Information Loss and Compensation in Spiking Neural NetworksXiubo Liang, Hongzhi Wang 0009, Zigen Li, Jinxing Han, Yu Zhao, Weidong Geng. 2732-2741 [doi]
- MediSee: Reasoning-Based Pixel-Level Perception in Medical ImagesQinyue Tong, Ziqian Lu, Jun Liu, Yangming Zheng, Zhe-Ming Lu 0001. 2742-2751 [doi]
- Progressive Representation Learning for Weakly-Supervised Camouflaged Object DetectionShuyong Gao, Qianyu Guo, Yu'ang Feng, Chunyuan Chen, Xujun Wei, Yan Wang, Wenqiang Zhang. 2752-2761 [doi]
- EgoPrompt: Prompt Learning for Egocentric Action RecognitionHuaihai Lyu, Chaofan Chen, Yuheng Ji, Changsheng Xu. 2762-2770 [doi]
- CWCP: Generalizing Virtual Reality to Real World with Contextual-Weather Correlation Pairing for Deraining and DesnowingYuwu Lu, Chunzhi Liu, Yihan Yang. 2771-2780 [doi]
- HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented GenerationPei Liu, Xin Liu, Ruoyu Yao, Junming Liu, Siyuan Meng, Ding Wang, Jun Ma 0008. 2781-2790 [doi]
- DichotomyIR: Universal Image Reconstruction via Dichotomy Classification and Uncertainty EliminationYan Zhang 0108, Shiwen He, Lin Yuan 0002, Jiaxu Leng, Xinbo Gao 0001. 2791-2800 [doi]
- Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction DetectionFrancesco Tonini, Lorenzo Vaquero, Alessandro Conti, Cigdem Beyan, Elisa Ricci 0001. 2801-2810 [doi]
- Enhanced Motion-aware Latent Diffusion Models for Video Frame InterpolationZhilin Huang, Chujun Qin, Yifei Xing 0001, Wenming Yang. 2811-2820 [doi]
- 3DAffordSplat: Efficient Affordance Reasoning with 3D GaussiansZeming Wei, Junyi Lin, Yang Liu 0084, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin. 2821-2830 [doi]
- BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element GuidanceHuy Le 0001, Nhat Chung, Tung Kieu, Anh Nguyen 0003, Ngan Le. 2831-2840 [doi]
- What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image SegmentationJianghang Lin, Yue Hu, Jiangtao Shen, Yunhang Shen, Liujuan Cao, Shengchuan Zhang, Rongrong Ji. 2841-2850 [doi]
- Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal RetrievalZhengyang Liang, MeiYu Liang, Wei Huang, Yawen Li 0001, Wu Liu, Yingxia Shao, Kangkang Lu 0002. 2851-2859 [doi]
- Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMsTiancheng Gu, Kaicheng Yang 0002, Ziyong Feng, Xingjun Wang, Yanzhao Zhang, Dingkun Long, Yingda Chen, Weidong Cai 0001, Jiankang deng. 2860-2869 [doi]
- CIA: Class- and Instance-aware Adaptation for Vision-Language ModelsLin Peng 0003, Cong Wan, Shaokun Wang, Xiang Song 0005, Yuhang He, Yihong Gong. 2870-2879 [doi]
- Visual Instance-aware Prompt TuningXi Xiao, Yunbei Zhang, Xingjian Li 0002, Tianyang Wang 0004, Xiao Wang 0004, Yuxiang Wei, Jihun Hamm, Min Xu 0009. 2880-2889 [doi]
- DualFPT: Handling Data Heterogeneity in Federated Prompt Tuning from both Generalized and Personalized PerspectiveYuliang Chen, Xi Lin 0003, Chao Sang, Xiu Su. 2890-2899 [doi]
- Counting by Points: Density-Guided Weakly-Supervised Nuclei Segmentation in Histopathological ImagesLingbo Zhang, Bingqian Sun, Linghan Cai, Yifeng Wang 0001, Ye Zhang 0008, Songhan Jiang, Kai Zhang 0012, Yongbing Zhang 0002. 2900-2908 [doi]
- FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts ReasoningHaodong Chen, Haojian Huang, Xinxiang Yin, Dian Shao. 2909-2918 [doi]
- Cross-Modal Dual-Causal Learning for Long-Term Action RecognitionShaowu Xu, XiBin Jia, Junyu Gao 0002, Qianmei Sun, Jing Chang 0006, Chao Fan. 2919-2928 [doi]
- Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic SegmentationJiahao Li, Yang Lu 0009, Yachao Zhang 0001, Fangyong Wang, Yuan Xie 0006, Yanyun Qu. 2929-2938 [doi]
- VLHP: Learning Discriminative Vision-Language Hybrid Prototypes for Weakly Supervised Semantic SegmentationJingyuan Fang, Yang Ning, Xiushan Nie, Xinfeng Liu, Zhiyong Cheng 0001. 2939-2948 [doi]
- DREAM: Document Reconstruction via End-to-end Autoregressive ModelXin Li 0118, Mingming Gong, Yunfei Wu, Jianxin Dai, Antai Guo, Xinghua Jiang, Haoyu Cao 0001, Yinsong Liu, Deqiang Jiang, Xing Sun 0001. 2949-2957 [doi]
- Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report GenerationLongzhen Yang, Zhangkai Ni, Ying Wen, Yihang Liu, Lianghua He, Heng Tao Shen. 2958-2967 [doi]
- Scaling Laws for Data-Efficient Visual Transfer LearningWenxuan Yang, Qingqv Wei, Chenxi Ma, Weimin Tan, Bo Yan. 2968-2976 [doi]
- Lightweight Medical Image Restoration via Integrating Reliable Lesion-Semantic Driven PriorPengcheng Zheng, Kecheng Chen, Jiaxin Huang 0006, Bohao Chen, Ju Liu, Yazhou Ren 0001, Xiaorong Pu. 2977-2986 [doi]
- InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofingKun-Hsiang Lin, Yu-Wen Tseng, Kang-Yang Huang, Jhih-Ciang Wu, Wen-Huang Cheng. 2987-2996 [doi]
- Test-Time Adaptation for Text-Based Person SearchKai Niu 0002, Liucun Shi, Ke Han, Qinzi Zhao, Yue Wu, Yanning Zhang 0001. 2997-3006 [doi]
- PAF: Prototype Adaptive Fusion for Test-Time Adaptation of Vision-Language ModelsSi Chen, Yujia Chen, Xiaotian Yin, Xin Liu, Huakai Lai, Tianzhu Zhang 0001. 3007-3016 [doi]
- Exploring Fourier Prior and Event Collaboration for Low-Light Image EnhancementChunyan She, Fujun Han, Chengyu Fang 0001, Shukai Duan 0001, Lidan Wang 0001. 3017-3026 [doi]
- RemoteSAM: Towards Segment Anything for Earth ObservationLiang Yao, Fan Liu, Delong Chen, Chuanyi Zhang, Yijun Wang, Ziyun Chen, Wei Xu, Shimin Di, Yuhui Zheng. 3027-3036 [doi]
- Gen4Track: A Tuning-free Data Augmentation Framework via Self-correcting Diffusion Model for Vision-Language TrackingJiawei Ge 0002, Xinyu Zhang, Jiuxin Cao, Xuelin Zhu, Weijia Liu, Qingqing Gao, Biwei Cao, Kun Wang 0057, Chang Liu 0113, Bo Liu 0004, Chen Feng 0028, Ioannis Patras. 3037-3046 [doi]
- SLGaussian: Fast Language Gaussian Splatting in Sparse ViewsKangjie Chen, Bingquan Dai, Minghan Qin, Dongbin Zhang, Peihao Li 0003, Yingshuang Zou, Haoqian Wang. 3047-3056 [doi]
- GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem SolutionsJo-Ku Cheng, Zeren Zhang, Ran Chen 0002, Jingyang Deng, Ziran Qin, Jinwen Ma. 3057-3066 [doi]
- MM-Prompt: Multi-modality and Multi-granularity Prompts for Few-Shot SegmentationHang Xiong, Runmin Cong, Jinpeng Chen 0003, Chen Zhang 0013, Feng Li 0037, Huihui Bai 0001, Sam Kwong. 3067-3075 [doi]
- Activation Shape Matters: OOD Detection with Norm-Entropy FusionJiawei Gu, Ziyue Qiao, Zechao Li. 3076-3084 [doi]
- Semantics-Driven Contrastive Learning for Real-World Depth Super ResolutionXinchen Ye, Aokai Zhang, Rui Xu 0002. 3085-3093 [doi]
- SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual GroundingJiawen Lin, Shiran Bian, Yihang Zhu, Wenbin Tan, Yachao Zhang 0001, Yuan Xie 0006, Yanyun Qu. 3094-3103 [doi]
- The Overlooked Matters: Revisiting Background, Prototype, and Activation in Few-Shot Medical Image SegmentationYucheng Shu, Yaohui Wang, Lihong Qiao, Feiyan Li, Bin Xiao, Weisheng Li 0001, Xinbo Gao 0001. 3104-3113 [doi]
- Mitigating Delivery Artifacts in Real-World Video Super-ResolutionJiaxin Peng, Siwang Zhou, Chengqing Li, Yucheng Li, Dunyun Chen. 3114-3123 [doi]
- Decoupling Dense Video Captioning via Task-specific PromptsWei Chen, Jianwei Niu 0002, Xuefeng Liu 0001, Xinghao Wu. 3124-3132 [doi]
- Semantic-Aware Hard Negative Mining for Medical Vision-Language Contrastive PretrainingYongxin Li, Ying Cheng 0005, Yaning Pan, Wen He, Qing Wang, Rui Feng 0001, Xiaobo Zhang. 3133-3142 [doi]
- MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language ModelsJiale Li, Mingrui Wu, Zixiang Jin, Hao Chen, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji. 3143-3152 [doi]
- FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled DataHezhao Liu, Yang Lu 0009, Mengke Li 0001, Yiqun Zhang 0006, Shreyank N. Gowda, Chen Gong 0002, Hanzi Wang. 3153-3162 [doi]
- InstructStep: Fine-Grained Localization of Step Content and Relation in Instructional VideoWangsheng He, Wanru Xu, Ping Guo, Zhenjiang Miao, Yi Tian. 3163-3172 [doi]
- Deciphering Functions of Neurons in Vision-Language ModelsJiaqi Xu, Cuiling Lan, Yan Lu 0001. 3173-3181 [doi]
- Can Person-Level Attributes Improve Group Re-Identification?Kamakshya Prasad Nayak, Kamalakar Vijay Thakare, Ashesh Xalxo, Lalit Lohani, Debi Prosad Dogra. 3182-3191 [doi]
- Seeing the Overlooked: Bio-Visual Inspired Weak Saliency Feedback Transformer for Person Re-identificationChangshuo Wang 0001, Shuting He, Xiang Fang, Fangzhe Nan, Prayag Tiwari. 3192-3201 [doi]
- HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning SegmentationWeihuang Lin, Yiwei Ma, Xiaoshuai Sun, Shuting He, Jiayi Ji, Liujuan Cao, Rongrong Ji. 3202-3211 [doi]
- KAID: Knowledge-Aware Interactive Distillation for Vision-Language ModelsDa Zhang 0010, Feiyu Wang, Bingyu Li, Zhiyuan Zhao 0005, Junyu Gao 0001, Xuelong Li 0001. 3212-3221 [doi]
- A Filtering Framework for Semi-online Referring Video Object SegmentationXiao Hu 0008, Heiko Neumann, Jochen Lang 0001. 3222-3231 [doi]
- StoryCrafter: Instance-Aligned Multi-Character Storytelling with Diffusion Policy LearningRuiqi Dong, Wenjing Pang, Chenjie Pan, Hengyang Lu, Chenyou Fan. 3232-3241 [doi]
- Contrastive Lie Algebra Learning for Ultra-Fine-Grained Visual CategorizationXiaohan Yu 0001, Zicheng Pan, Yang Zhao 0002, Qin Zhang, Yongsheng Gao 0001. 3242-3250 [doi]
- Decoupled Global-Local Alignment for Improving Compositional UnderstandingXiaoxing Hu, Kaicheng Yang 0002, Jun Wang, Haoran Xu, Ziyong Feng, Yupei Wang. 3251-3260 [doi]
- EchoVim: Making Vision Mamba Docile for Echocardiography Video Segmentation via Dynamic Interaction and Semantic Token-attentive RefinementJingxing Guo, Guilian Chen, Yimu Sun, Huisi Wu, Jing Qin. 3261-3269 [doi]
- Towards Space and Semantics: Object-Purified Representation Learning for Multi-Label Image ClassificationHaifeng Zhao 0001, Shuo Xu, Leilei Ma 0002, Yufei Zhang, Lei Wang 0095, Dengdi Sun. 3270-3279 [doi]
- Building Embodied EvoAgent: A Brain-inspired Paradigm for Bridging Multimodal Large Models and World ModelsJunyu Gao 0002, Xuan Yao 0001, Yong Rui, Changsheng Xu. 3280-3289 [doi]
- Unveiling Open-set Noise: Theoretical Insights into Label NoiseChen Feng, Nicu Sebe, Georgios Tzimiropoulos, Miguel R. D. Rodrigues, Ioannis Patras. 3290-3299 [doi]
- Character-Centric Understanding of Animated MoviesZhongrui Gui, Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman. 3300-3309 [doi]
- See Different, Think Better: Visual Variations Mitigating Hallucinations in LVLMsZiyun Dai, Xiaoqiang Li 0002, Shaohua Zhang, Yuanchen Wu, Jide Li. 3310-3319 [doi]
- Multi-round Mutual Emotion-Cause Pair Extraction for Emotion-Attributed Video CaptioningCheng Ye, Weidong Chen 0013, Peipei Song, Xinyan Liu, Lei Zhang, Zhendong Mao 0001. 3320-3329 [doi]
- Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD GenerationWenhao Zheng 0002, Chenwei Sun, Wenbo Zhang, Jiancheng Lv, Xianggen Liu. 3330-3339 [doi]
- Quantifying Samples with Invariance for Source-Free Class Incremental Domain AdaptationZhiYu Ye, Guowen Li, Haoyuan Liang, Zixi Wang, Shilei Cao 0005, Yushan Lai, Juepeng Zheng. 3340-3349 [doi]
- MINDEV: Multi-modal Integrated Diffusion Framework for Video Reconstruction from EEG SignalsShuai Huang, Yongxiong Wang, Huan Luo, Haodong Jing, Chendong Qin, Jingqun Tang. 3350-3359 [doi]
- Balancing Cross-Modal Attention for Generalized Zero-Shot LearningZhijie Rao, Jingcai Guo. 3360-3369 [doi]
- Beyond Visual Quality: Fidelity-Oriented Diffusion Model for Real-world Image Super-ResolutionZhenxuan Fang, Shuaibo Wang, Weisheng Dong, Junwei Xu, Fangfang Wu, Xin Li 0005, Guangming Shi. 3370-3379 [doi]
- Reversible Privacy Preserving on Vision-Language Models via Adversarial Multimodal KeyPeng Ying, Zhongnian Li, Meng Wei 0006, Xinzheng Xu. 3380-3389 [doi]
- Evaluating the Evaluators: Towards Human-aligned Metrics for Missing Markers ReconstructionTaras Kucherenko, Derek Peristy, Judith Bütepage. 3390-3398 [doi]
- B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal UnderstandingChangho Choi, Youngwoo Shin, Gyojin Han, Dong-Jae Lee, Junmo Kim 0002. 3399-3407 [doi]
- Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentationFenghe Tang, Bingkun Nian, Jianrui Ding, Wenxin Ma, Quan Quan, Chengqi Dong, Jie Yang 0002, Wei Liu 0044, S. Kevin Zhou. 3408-3417 [doi]
- TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary GenerationLing You, Wenxuan Huang 0001, Xinni Xie, Xiangyi Wei, Bangyan Li, Shaohui Lin, Yang Li, Changbo Wang. 3418-3427 [doi]
- Sentence-level Segmentation for Long Sign Language Videos with CaptionsBowen Guo, Shiwei Gan, Yafeng Yin 0002, Xiao Liu 0043, Zhiwei Jiang 0001, Shunmei Meng. 3428-3437 [doi]
- 3: Dual-Modal Counterfactual Contrastive Construction for Egocentric Video Question AnsweringJiayi Zou, Chaofan Chen, Bing-Kun Bao, Changsheng Xu. 3438-3447 [doi]
- City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete LearningPenglei Sun, Yaoxian Song, Xiangru Zhu, Xiang Liu, Qiang Wang 0022, Yue Liu, Changqun Xia, Tiefeng Li, Yang Yang 0001, Xiaowen Chu 0001. 3448-3457 [doi]
- CoFiVLA: Synergistic Coarse-Fine Vision-Language Alignment for Image Aesthetic AssessmentYuzhen Niu, Siling Chen, Yuzhong Chen 0001, Fusheng Li, Rui Xu 0028, Hui Da. 3458-3467 [doi]
- FlowTrack: Integrating Adjacent-Frame Motion Tracking and Adaptive Prediction for Robust Semi-Supervised VOSDuolin Wang, Guanyu Xing, Yanli Liu 0002. 3468-3476 [doi]
- Differential Contrastive Training for Gaze EstimationLin Zhang, Yi Tian, Xiyun Wang, Wanru Xu, Yi Jin 0001, Yaping Huang. 3477-3486 [doi]
- RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation ParadigmTiancheng Gu, Kaicheng Yang 0002, Chaoyi Zhang, Yin Xie, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai 0001, Jiankang deng. 3487-3496 [doi]
- Adaptive Neighbors and Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation with Noisy LabelsYanting Pei, Fan Yang. 3497-3506 [doi]
- EditEval: Towards Comprehensive and Automatic Evaluation for Text-guided Video EditingBingshuai Liu, Ante Wang, Zijun Min, Chenyang Lyu, Longyue Wang, ZhiHao Wang, Xu Han, Peng Li, Jinsong Su. 3507-3516 [doi]
- FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated VideosRui Chen, Lei Sun, Jing Tang, Geng Li, Xiangxiang Chu. 3517-3526 [doi]
- VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual StainingZizhi Chen, Xinyu Zhang, Minghao Han, Yizhou Liu 0002, Ziyun Qian, Weifeng Zhang, Xukun Zhang, Jingwei Wei, Lihua Zhang 0002. 3527-3536 [doi]
- PatchWiper: Leveraging Dynamic Patch-Wise Parameters for Real-World Visible Watermark RemovalZihao Mo, Junye Chen, Chaowei Fang, Guanbin Li. 3537-3545 [doi]
- DFGAP: Towards Depth-Free Cross-Category GAParts Perception via Uncertainty-Quantified ModelingXueyu Yuan, Jiarui Zhang, Jiangqi Song, Liu Liu 0012, Li Zhang 0104, Dan Guo 0001, Richang Hong, Meng Wang 0002. 3546-3554 [doi]
- DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language ModelsYudong Zhang 0008, Ruobing Xie, Xingwu Sun, Yiqing Huang, Jiansheng Chen 0001, Zhanhui Kang, Di Wang, Yu Wang 0002. 3555-3564 [doi]
- Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution DetectionWenjie Zhu, Yabin Zhang, Xin Jin 0014, Wenjun Zeng 0001, Lei Zhang 0006. 3565-3574 [doi]
- Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant LayersJi Ma 0008, Wei Suo, Peng Wang 0015, Yanning Zhang 0001. 3575-3584 [doi]
- GPT-ReID: Learning Fine-grained Representation with GPT for Text-based Person RetrievalXudong Wang, Lei Tan, Pingyang Dai, Liujuan Cao, Rongrong Ji. 3585-3594 [doi]
- Visual Perception Uncertainty Learning for Hallucination Detection in Large Vision-Language ModelsRunze Zhao, Fuqing Zhu, Jizhong Han, Songlin Hu 0001. 3595-3604 [doi]
- Fourier Self-Adaptation for Transferring General Pretrained Models to Specific DomainsLei Liu, Xiangdong Su, Guanglai Gao. 3605-3614 [doi]
- Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAEYiying Yang, Fukun Yin, Jiayuan Fan, WanZhang Li, Xin Chen 0040, Gang Yu 0002. 3615-3624 [doi]
- Zero-shot Compositional Action Recognition with Neural Logic ConstraintsGefan Ye, Lin Li 0065, Kexin Li, Jun Xiao 0001, Long Chen 0016. 3625-3634 [doi]
- MeDKCoOp: Dual Knowledge-guided Graph Prompt Learning for Biomedical Vision-Language ModelsYijun Wang, Siying Wu, Lubin Gan, Zheyu Zhang 0002, Jing Zhang, Zhangchi Hu, Huyue Zhu, Peixi Wu, Xiaoyan Sun 0001. 3635-3644 [doi]
- Twin Co-Adaptive Dialogue for Progressive Image GenerationJianhui Wang, Yangfan He, Yan Zhong 0001, Xinyuan Song 0002, Jiayi Su, Yuheng Feng, Ruoyu Wang, Hongyang He, Wenyu Zhu, Xinhang Yuan, Miao Zhang, Keqin Li, Jiaqi Chen, Tianyu Shi, Xueqian Wang. 3645-3653 [doi]
- Multi-Agent System for Comprehensive Soccer UnderstandingJiayuan Rao, Zifeng Li, Haoning Wu, Ya Zhang, Yanfeng Wang 0001, Weidi Xie. 3654-3663 [doi]
- Vision Transformer with Sparse Scan PriorYuguang Zhang, Qihang Fan, Huaibo Huang. 3664-3672 [doi]
- Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint GraphsShaohui Dai, Yansong Qu, Zheyan Li, Xinyang Li, Shengchuan Zhang, Liujuan Cao. 3673-3682 [doi]
- GUI-Narrator: Detecting and Captioning Computer GUI ActionsQinchen Wu, Difei Gao, Qinghong Lin, Zhuoyu Wu, Mike Zheng Shou. 3683-3692 [doi]
- DSACap: Enhancing Visual-Semantic Alignment with Diffusion-based Framework for Image CaptioningLiangyu Fu, Junbo Wang 0003, Yuke Li, Qiangguo Jin, Hongsong Wang 0001, Jing Ya, Linjiang Huang, Liang Yao, Jiangbin Zheng 0001, Xuecheng Wu, Zhiyong Wang. 3693-3701 [doi]
- Seeing the Undefined: Chain-of-Action for Generative Semantic LabelsMeng Wei 0006, Zhongnian Li, Peng Ying, Xinzheng Xu. 3702-3711 [doi]
- Less is More: High-value Data Selection for Visual Instruction TuningZikang Liu 0001, Kun Zhou 0002, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen. 3712-3721 [doi]
- Exploring Global Correlations via Polarity Memory for Multispectral DemosaicingMengzu Liu, Junwei Xu, Tao Huang, Fangfang Wu, Le Dong, Xin Li 0005, Weisheng Dong. 3722-3730 [doi]
- Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus AdaptationZhaofeng Shi, Heqian Qiu, Lanxiao Wang, Qingbo Wu 0001, Fanman Meng, Hongliang Li 0001. 3731-3740 [doi]
- Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object SegmentationChao Yin 0001, Hao Li, Kequan Yang, Jide Li, Pinpin Zhu, Xiaoqiang Li 0002. 3741-3750 [doi]
- Multi-Layer Gaussian Splatting for Single-Image Feed-Forward Spatial Scene ReconstructionShanding Diao, Yang Zhao 0002, Yuan Chen 0012, Zhao Zhang 0001, Wei Jia 0001, Ronggang Wang. 3751-3759 [doi]
- Learning Arbitrary-Scale RAW Image Downscaling with Wavelet-based Recurrent ReconstructionYang Ren, Hai Jiang 0006, Wei Li 0075, Menglong Yang, Heng Zhang 0042, Zehua Sheng, Qingsheng Ye, Shuaicheng Liu. 3760-3768 [doi]
- MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from TextbooksWenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan. 3769-3778 [doi]
- Clustering-Based Tail-class Mitigation for New-class DiscoveryZelei Wu, Xulun Ye, Jieyu Zhao 0002. 3779-3787 [doi]
- SDP: Spectral-Decomposed Prompting for Continual LearningSiqi Song, Limin Yu, Jimin Xiao. 3788-3797 [doi]
- VLN-ChEnv: Vision-language Navigation in Changeable EnvironmentsShubo Liu, Hongsheng Zhang, Qian Qiao, Qi Wu 0001, Peng Wang 0015. 3798-3807 [doi]
- CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language ModelsKedong Xiu, Sai Qian Zhang. 3808-3816 [doi]
- Optimal Feature Embedding for Document Large Visual Language ModelFan Yang 0082, Ling Deng, Zhiyong Gan, Qisheng He, Yuanbo Fang, Xiangmin Xu, Shuangping Huang, Tianshui Chen. 3817-3826 [doi]
- Compositional Zero-shot Learning via Progressive Language-based ObservationsLin Li 0065, Guikun Chen, Zhen Wang, Jun Xiao 0001, Long Chen 0016. 3827-3836 [doi]
- Pushing the Limit of Binarized Neural Network for Image Super Resolution with Smooth Information TransmissionWeimin Cheng, Zhenyu Wang, Tao Huang, Fangfang Wu, Weisheng Dong. 3837-3846 [doi]
- Reliable Cross-modal Alignment via Prototype Iterative ConstructionXiang Ma, Litian Xu, Lexin Fang, Caiming Zhang 0001, LiZhen Cui. 3847-3855 [doi]
- WaveCL: Wavelet Calibration Learning for Referring Video Object SegmentationRan Chen, Taiyi Su, Hanli Wang. 3856-3864 [doi]
- Hierarchical Spatiotemporal Context Aggregation and Speckle-aware Deformable Convolution for Echocardiography Video SegmentationJingxing Guo, Guilian Chen, Yimu Sun, Huisi Wu, Jing Qin. 3865-3874 [doi]
- Consistency of Local and Global Flatness for Federated LearningJunkang Liu, Fanhua Shang, Yuxuan Tian, Hongying Liu 0001, Yuanyuan Liu 0001. 3875-3883 [doi]
- FFCBA: Feature-based Full-target Clean-label Backdoor AttacksYangxu Yin, Honglong Chen, Yudong Gao, Peng Sun 0003, Liantao Wu, Zhe Li 0026, Weifeng Liu 0001. 3884-3892 [doi]
- EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and ModelSijing Li, Tianwei Lin, Lingshuai Lin, Wenqiao Zhang, Jiang Liu, Xiaoda Yang, Juncheng Li 0006, Yucheng He, Xiaohui Song, Jun Xiao 0001, Yueting Zhuang, Beng Chin Ooi. 3893-3902 [doi]
- Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and LocalizationChangtao Miao, Qi Chu 0001, Tao Gong, Zhentao Tan, Zhenchao Jin, Wanyi Zhuang, Man Luo, Honggang Hu, Nenghai Yu. 3903-3912 [doi]
- Ali-UI: Enhancing Complex Vision-Language Navigation with Alignment of Unified Map and Instruction ParsingShanshan Li, Jiawei Hou, Da Huang, Yanwei Fu 0001, Xiangyang Xue 0001. 3913-3922 [doi]
- Stealthy-AE: Generating Stealthy Adversarial Examples through Online Social NetworksZiming Zhao 0008, Zhaoxuan Li, Tingting Li 0004, Fan Zhang 0010. 3923-3931 [doi]
- LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning SegmentationHanning Chen, Yang Ni 0001, Wenjun Huang 0001, Hyunwoo Oh, Yezi Liu, Tamoghno Das, Mohsen Imani. 3932-3941 [doi]
- BAC-GCN: Background-Aware CLIP-GCN Framework for Unsupervised Multi-Label ClassificationYonghyeon Jo, Janghyun Kim, Jinsun Park. 3942-3951 [doi]
- Mitigating Query Selection Bias in Referring Video Object SegmentationDingwei Zhang, Dong Zhang, Jinhui Tang 0001. 3952-3961 [doi]
- DFCNet: Dual-Factor Compensatory Clustering Network for Modality-Imbalanced Generalized Zero-Shot LearningXiangyu Shan, Heng Song, Junwu Zhu. 3962-3971 [doi]
- Video-to-Image Affordance Grounding via Visual Conceptual LearningZhiyuan Fan, Keyi Liang. 3972-3980 [doi]
- MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language ModelsQiyan Zhao, Xiaofeng Zhang, Yiheng Li, Yun Xing, Xiaosong Yuan, Feilong Tang, Sinan Fan, Xuhang Chen 0002, Da-Han Wang, Xu-Yao Zhang. 3981-3990 [doi]
- Medical Vision-Language Pre-training with Multimodal Variational Masked Autoencoder for Robust Medical VQADexuan Xu, Yanyuan Chen, Yu Huang 0004, Shihao E, Yiwei Lou, Yongzhi Cao, Hanpin Wang, Meikang Qiu. 3991-4000 [doi]
- T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video RetrievalYili Li, Gang Xiong, Gaopeng Gou, Xiangyan Qu, Jiamin Zhuang, Zhen Li, Junzheng Shi. 4001-4009 [doi]
- ReMeREC: Relation-aware and Multi-entity Referring Expression ComprehensionYizhi Hu, Zezhao Tian, Xingqun Qi, Chen Su, Bingkun Yang, Junhui Yin, Muyi Sun, Man Zhang 0005, Zhenan Sun. 4010-4019 [doi]
- DisFaceRep: Representation Disentanglement for Co-occurring Facial Components in Weakly Supervised Face ParsingXiaoqin Wang, Xianxu Hou, Meidan Ding, Junliang Chen 0002, Kaijun Deng, Jinheng Xie, LinLin Shen. 4020-4029 [doi]
- SAM-TTT: Segment Anything Model via Reverse Parameter Configuration and Test-Time Training for Camouflaged Object DetectionZhenni Yu, Li Zhao 0005, Guobao Xiao, Xiaoqin Zhang 0002. 4030-4038 [doi]
- VL-DynaRefine: A Vision-Language Dynamic Refinement Approach for Visual ReasoningJing Ma, Haochen Sun, Zeyuan Zang, Fangxiang Feng, Caixia Yuan, Lei Ren, Huixing Jiang, Wei Chen, Xiaojie Wang 0006. 4039-4047 [doi]
- Forward-Only Continual LearningJiao Chen, Jiayi He, Fangfang Chen, Zuohong Lv, Jianhua Tang. 4048-4057 [doi]
- BOLT: Fewer Tokens but More Performance Retention for Efficient Vision-Language Models InferenceJiahua Bao, Siyao Cheng, Jiaxing Du, Changjiang He, Zeming Lang, Hao Zhang 0016, Jie Liu 0001. 4058-4067 [doi]
- CITR: Efficient Long Video Understanding Needs Causal ImportanceZiqi Yuan, Jun Li, Yanghao Li, Yuxiang Huang, Chi Chen, Shuo Wang 0013, Zhinan Gou. 4068-4076 [doi]
- Diverse and Public Features Cooperation via Gradient Rectification for Federated Prompt LearningQi Li, Yucan Zhou, Jiang Zhou, XingYou Yang, Xiaoyan Gu 0001. 4077-4086 [doi]
- Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and InteractionShilei Wang, Gong Cheng 0003, Pujian Lai, Dong Gao, Junwei Han 0001. 4087-4096 [doi]
- Cognitive Predictive Coding Network: Rethinking the Generalization in Raven's Progressive MatricesXinyu Zhang 0021, Lingling Zhang 0005, Yanrui Wu, Muye Huang, Jun Liu 0002. 4097-4106 [doi]
- FACE: A Dual-Template and Adaptive Curriculum Framework for Unsupervised Text-Based Person SearchXiaoxuan Mu, Haoyu Tang 0002, Han Jiang 0012, Tianyuan Liang, Qinghai Zheng, Jihua Zhu. 4107-4116 [doi]
- Open-Set Image Tagging with Multi-Grained Text SupervisionXinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng 0001, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang 0001. 4117-4126 [doi]
- Gloss Matters: Unlocking the Potential of Non-Autoregressive Sign Language TranslationZhiHao Wang, Shiyu Liu, Zhiwei He, Kangjie Zheng, Liangying Shao, Junfeng Yao, Jinsong Su. 4127-4136 [doi]
- Collaboration Wins More: Dual-Modal Collaborative Attention Reinforcement for Mitigating Large Vision Language Models HallucinationJiye Xie, Yifei Gao, Liangliang You, Xiang Xu, Haoran Xu, Zhiqiang Kou, Kexue Fu 0001, Youyang Qu, Wenjie Yang, Jianwei Guo, Weiliang Meng, Longxiang Gao, Haoran Yang, Changwei Wang, Yu Zhang 0133. 4137-4146 [doi]
- Towards Training-Free Open-World Classification with 3D Generative ModelsXinzhe Xia, Weiguang Zhao, Yuyao Yan, Guanyu Yang 0002, Rui Zhang 0012, Kaizhu Huang, Xi Yang 0008. 4147-4155 [doi]
- Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language ModelsMingyu Fu, Wei Suo, Ji Ma 0008, Lin Yuanbo Wu, Peng Wang 0015, Yanning Zhang 0001. 4156-4165 [doi]
- Quantum Interference-Inspired Who-What-Where Composite-Semantics Instance Search for Story VideosZijun Xu 0001, Jiahao Guo, Chunjie Zhang 0001, Zhongyuan Wang 0001, Chunxia Xiao, Chao Liang. 4166-4174 [doi]
- Pathology-Aware Reconstruction with Discriminative Knowledge Boosting Alignment for Che-Xray Vision-Language Pre-trainingLihong Qiao, Shiyi Gao, Yucheng Shu, Bin Xiao, Weisheng Li 0001, Xinbo Gao 0001. 4175-4184 [doi]
- Slot Attention with Re-Initialization and Self-DistillationRongzhen Zhao, Yi Zhao 0014, Juho Kannala, Joni Pajarinen. 4185-4192 [doi]
- NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous DrivingQucheng Peng, Chen Bai, Guoxiang Zhang, Bo Xu 0031, Xiaotong Liu, Xiaoyin Zheng, Chen Chen 0001, Cheng Lu. 4193-4202 [doi]
- CorrNeXt: Making the ConvNet-Style Correspondence Pruner Stronger for Two-View GeometryZizhuo Li, Chunbao Su, Fan Fan 0001, Jun Huang 0008, Jiayi Ma 0001. 4203-4212 [doi]
- DREAM: Integrating Hierarchical Multimodal Retrieval with Multi-page Multimodal Language Model for Documents VQAJinxu Zhang, Qiyuan Fan, Yongqi Yu, Yu Zhang 0030. 4213-4221 [doi]
- Visual Localization using Hybrid Feature Grid and Learned Weighted Global Point CloudJunyi Wang 0001, Yue Qi. 4222-4231 [doi]
- Debiasing Multimodal Large Language Models via Penalization of Language PriorsYifan Zhang 0004, Yang Shi 0009, Weichen Yu, Qingsong Wen, Xue Wang 0010, Wenjing Yang 0002, Zhang Zhang 0001, Liang Wang 0001, Rong Jin 0001. 4232-4241 [doi]
- Cross-Counter-Repeat Attention for Enhanced Understanding of Visual Semantics in Radiology Report GenerationXiaolei Bo, Feiyang Yang, Feilong Xu, Xiaoli Zhang 0001. 4242-4250 [doi]
- MPI-CD: Multi-Path Information Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language ModelsJiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Wenzhen Yuan 0002, Ting Liu 0016, Yuzhuo Fu. 4251-4260 [doi]
- LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View SynthesisHao Sun, Fenggen Yu, Huiyao Xu, Tao Zhang, Changqing Zou. 4261-4270 [doi]
- RealVG: Unleashing MLLMs for Training-Free Spatio-Temporal Video Grounding in the WildHongchen Wei, Zhenzhong Chen 0001. 4271-4280 [doi]
- Visual Context Window Extension: A New Perspective for Long Video UnderstandingHongchen Wei, Zhenzhong Chen 0001. 4281-4289 [doi]
- TPDepth: Leveraging Text Prompts with ControlNet to Boost Diffusion-based Depth EstimationYu Liu, Kun Sun, Chang Tang, Yuhua Qian, Xin Li 0005. 4290-4299 [doi]
- GM-DF: Generalized Multi-Scenario Deepfake DetectionYingxin Lai, Hongyang Wang 0001, Jing Yang, Xiangui Kang, Bin Li 0011, LinLin Shen, Zitong Yu. 4300-4309 [doi]
- FedAPT: Federated Adversarial Prompt Tuning for Vision-Language ModelsKun Zhai, Siheng Chen, Xingjun Ma, Yu-Gang Jiang 0001. 4310-4318 [doi]
- BTUAP: Boosting the Transferability of Universal Adversarial Perturbations in the Black-box Setting under various data dependenciesJie Wan, Jianhao Fu, Ziqi Yang, Kui Ren 0001. 4319-4328 [doi]
- MARA: A Multimodal Adaptive Retrieval-Augmented Framework for Document Question AnsweringHui Wu, Haoquan Zhai, Yuchen Li 0006, Hengyi Cai, Peirong Zhang, Yidan Zhang, Lei Wang, Chunle Wang, Yingyan Hou, Shuaiqiang Wang, Dawei Yin 0001. 4329-4338 [doi]
- DR-VQA: Decompose-then-Reconstruct for Visual Question Answering in BLV AssistanceBocheng Pan, Hailong Shi, Xingyu Gao 0001. 4339-4348 [doi]
- U-MERE: Unconstrained Multimodal Entity and Relation Extraction with Collaborative Modeling and Order-Sensitive OptimizationWei Jia, Li Jin 0001, Kaiwen Wei, Yuying Shang, Nayu Liu, Zhicong Lu, Qing Liu, Linhao Zhang, Jiang Zhong, Yanfeng Hu. 4349-4358 [doi]
- EMIFS: Efficient Multi-scale Information Fusion Self-supervision for Medical Image SegmentationLuyao Ren, Wenxin Yu, Zhiqiang Zhang, Chang Liu. 4359-4368 [doi]
- CGCOD: Class-Guided Camouflaged Object DetectionChenxi Zhang, Qing Zhang, Jiayun Wu, Youwei Pang. 4369-4377 [doi]
- OGDepth: Leveraging Object Guidance in Diffusion Models for Enhanced Monocular Depth EstimationWenzheng Yang, Songwei Pei, Bingfeng Liu, Qian Li, Shangguang Wang. 4378-4387 [doi]
- TrustCLIP: Learning from Noisy Labels via Semantic Label Verification and Trust-aligned Gradient ProjectionXueyi Zhang, Peiyin Zhu, Yuan Liao, Xiyu Wang, Mingrui Lao, Siqi Cai 0002, Yanming Guo, Haizhou Li 0001. 4388-4397 [doi]
- Towards Explainable Fake Image Detection with Multi-Modal Large Language ModelsYikun Ji, Yan Hong 0001, Jiahui Zhan, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang 0002, Liqing Zhang 0001, Jianfu Zhang 0003. 4398-4407 [doi]
- UniAD: Integrating Geometric and Semantic Cues for Unified Anomaly DetectionXiaodong Wang 0010, Hongmin Hu, Fei Yan, Junwen Lu, Zhiqiang Zeng, Weidong Hong, Zhedong Zheng. 4408-4417 [doi]
- Ground and Reconstruct: Entity-Region Bidirectional Alignment Pre-Training for Low-Resource GMNERRunwei Situ, Yi Cai 0001, Yong Xu, Jiexin Wang. 4418-4426 [doi]
- Rodecon-net: Medical Image Segmentation via Robust Decoupling and Contrast-enhanced FusionYongquan Xue, Zhaoru Guo, Zhaozhao Su, Chong Peng, Jun Feng, Pan Zhou 0001, Marcin Pietron, Xiyuan Wang, Liejun Wang, Panpan Zheng. 4427-4435 [doi]
- MRBench: A Multi-Image Reasoning Benchmark with Adaptive Knowledge RetrievalWenxi Huang, Xiaojun Chen 0006, Qin Zhang 0011, Ting Wan, Ziqi Liu, Liangjie Zhang. 4436-4445 [doi]
- CrossMind-VL: Multi-Subject Mind-to-Video Decoding with Multimodal LLM Semantic GroundingXuanliu Zhu, Yiqiao Chai, Runnan Li, Mingying Lan, Li Gao. 4446-4454 [doi]
- PeriodVOS: Learning Periodic Patterns for Unsupervised Video Object Segmentation via Adaptive Contextual CouplingJiaqing Fan, Hanwen Qian, Mengjuan Jiang, Fanzhang Li. 4455-4463 [doi]
- Referring Expression Instance Retrieval and A Strong End-to-End BaselineXiangzhao Hao, Kuan Zhu, Hongyu Guo, Haiyun Guo, Ning Jiang, Quan Lu, Ming Tang 0001, Jinqiao Wang. 4464-4473 [doi]
- VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number ControlLifeng Lin, Rongfeng Lu, Quan Chen, Haofan Ren, Ming Lu, Yaoqi Sun, Chenggang Yan 0001, Anke Xue. 4474-4483 [doi]
- Regist3R: Incremental Registration with Stereo Foundation ModelSidun Liu, Wenyu Li, Peng Qiao, Yong Dou. 4484-4493 [doi]
- AnchorSync: Global Consistency Optimization for Long Video EditingZichi Liu, Yinggui Wang, Tao Wei 0002, Chao Ma 0004. 4494-4503 [doi]
- Fine-grained Zero-Shot Object DetectionHongxu Ma, Chenbo Zhang, Lu Zhang 0060, Jiaogen Zhou, Jihong Guan, Shuigeng Zhou. 4504-4513 [doi]
- MS-DETR: Towards Effective Video Moment Retrieval and Highlight Detection by Joint Motion-Semantic LearningHongxu Ma, Guanshuo Wang, Fufu Yu, Qiong Jia 0004, Shouhong Ding. 4514-4523 [doi]
- HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided DronesHao Ruan, Jinliang Lin, Yingxin Lai, Zhiming Luo, Shaozi Li. 4524-4533 [doi]
- Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive TrainingYun Li, Lina Yao 0001, Zhe Liu 0023. 4534-4541 [doi]
- VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity RecognitionZhuming Wang, Yihao Zheng 0002, Jiarui Li, Yaofei Wu, Yan Huang, Zun Li, Lifang Wu, Liang Wang 0001. 4542-4551 [doi]
- Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual GroundingYuzhen Li, Min Liu 0033, Yuan Bian 0002, Xueping Wang, Zhaoyang Li, Gen Li, Yaonan Wang 0001. 4552-4561 [doi]
- Mitigating the Evolving Semantic Entanglement in Continual Learning of Vision-Language ModelsYiliang Zhu 0002, Dayan Wu, Qinghang Su, Zexian Yang, Zheng Lin 0001, Weiping Wang 0005. 4562-4570 [doi]
- SegTraj: A Segmented-Trajectory-Aware Spatio-Temporal Graph Convolutional Network for Social Group DetectionXiongwei Dang, Wenxuan Liu 0008, Xian Zhong, Zheng Wang 0007. 4571-4579 [doi]
- CSDN: CLIP-Driven Similarity-Aligned Distillation Network for Weakly-Supervised Object LocalizationSifan Zuo, Youfa Liu, Bo Du 0001. 4580-4589 [doi]
- Learning Structural Priors via Laplacian RWKV Diffusion with Light-Effect Dataset for Nighttime Visibility EnhancementDirui Xie, Xiaofang Hu, ZiHan Wei, Zhengqiqi Yang, Yanlian Jiang, Yue Zhou 0011. 4590-4599 [doi]
- Chain-of-Thought Guided Semantic Debiasing for Low-Shot Vision-Language TasksBiao Chen, Kunbin He, Zhikun Zheng, Mengmeng Jing, Lin Zuo. 4600-4609 [doi]
- Learn 3D VQA Better with Active Selection and ReannotationShengli Zhou, Yang Liu, Feng Zheng 0001. 4610-4618 [doi]
- EvoVLMA: Evolutionary Vision-Language Model AdaptationKun Ding, Ying Wang 0008, Shiming Xiang. 4619-4628 [doi]
- DSP: Dense-Sparse Parallel Networks for Self-supervised 3D Multi-person Pose Estimation from Multiple ViewsYang Liu, Zhiyong Zhang 0005. 4629-4638 [doi]
- GraphVideoAgent: Enhancing Long-form Video Understanding with Entity Relation GraphsMeng Chu, Yicong Li 0004, Tat-Seng Chua. 4639-4648 [doi]
- Test-Time Adaptation of Medical Vision-Language Models with Mixture of Modality ExpertsHancong Wang, Yue Yu 0001, Hairong Zheng, Tong Zhang 0017. 4649-4658 [doi]
- Eye-based Emotion Recognition via Event-Driven Sparse TransformersZixuan Wan, Jiqing Zhang, Yushan Wang, Hu Lin, Yafei Wang 0004, Zetian Mi, Xin Yang, XianPing Fu, Huibing Wang. 4659-4668 [doi]
- DGFSD: Bridging the Gap between Dense and Sparse for Fully Sparse 3D Object DetectionGuoxin Zhang, Zhonghong Ou, Kaiwen Xue, Jiangfeng Sun 0003, Yifan Zhu, Siyuan Yao, Yiran Shen 0007, Meina Song. 4669-4678 [doi]
- MMPro: A Decoupled Perception-Thinking-Execution Framework for Secure GUI AgentBenLong Wu, Yuang Qi, Xiuwei Shang, Weiming Zhang 0001, Nenghai Yu, Kejiang Chen. 4679-4687 [doi]
- PRIME: Prototype-Driven Class Incremental Learning for Medical Image SegmentationShengqian Zhu, Chengrong Yu, Wenbo Qi, Jiafei Wu, Ying Song, Guangjun Li, Zhang Yi, Xiaogang Xu 0002, Junjie Hu 0004. 4688-4697 [doi]
- EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event PredictionQile Su, Shoutai Zhu, Shuai Zhang, Baoyu Liang, Chao Tong 0001. 4698-4707 [doi]
- DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label RecognitionHaijing Liu, Tao Pu 0002, Hefeng Wu, Keze Wang, Liang Lin. 4708-4717 [doi]
- STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language ModelsMahiro Ukai, Shuhei Kurita, Nakamasa Inoue. 4718-4727 [doi]
- Gradient-Aware Revitalization of Non-Effective Samples in Medical Image SegmentationShiying Lin, Rong Hu, Zuoyong Li, Qinghua Lin, Jiawei Wu 0001, Changqing Zhang 0004. 4728-4737 [doi]
- Self-Supervised Human Mesh Recovery from Partial Point Cloud via a Self-Improving LoopChang Su, Beihong Jin, Fusang Zhang, Siheng Li, Zhi Wang 0016. 4738-4747 [doi]
- Noise Self-Correction via Relation Propagation for Robust Cross-Modal RetrievalRuoxuan Li, Xiangyu Wu, Yang Yang 0074. 4748-4757 [doi]
- Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained ExpertsYangyang Xu, Xi Ye, Duo Su. 4758-4767 [doi]
- WMamba: Wavelet-based Mamba for Face Forgery DetectionSiran Peng, Tianshuo Zhang, Li Gao, Xiangyu Zhu 0001, Haoyuan Zhang, Kai Pang, Zhen Lei 0001. 4768-4777 [doi]
- Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language ModelsNanxing Hu, Xiaoyue Duan, Jinchao Zhang, Guoliang Kang. 4778-4787 [doi]
- Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual VariationsYiwen Liang, Hui Chen 0013, Yizhe Xiong, Zihan Zhou, Mengyao Lyu, Zijia Lin, Shuaicheng Niu, Sicheng Zhao, Jungong Han, Guiguang Ding. 4788-4797 [doi]
- Toward Robust Deepfake Detection: A Proactive Method Based on Watermarking and Knowledge DistillationChunpeng Wang 0001, Wenlong Ma, Li Zou, Zhiqiu Xia, Qi Li, Bin Ma 0003, Yunan Liu. 4798-4807 [doi]
- Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language ModelsFuta Waseda, Saku Sugawara, Isao Echizen. 4808-4816 [doi]
- Benchmarking Retrieval-Augmented Generation in Multi-Modal ContextsZhenghao Liu 0001, Xingsheng Zhu, Tianshuo Zhou, Xinyi Zhang, Xiaoyuan Yi, Yukun Yan, Ge Yu 0001, Maosong Sun 0001. 4817-4826 [doi]
- MESH - Understanding Videos Like Human: Measuring Hallucinations in Large Video ModelsGarry Yang, Zizhe Chen, Man Hon Wong 0001, Haoyu Lei, Yongqiang Chen 0002, Zhenguo Li, Kaiwen Zhou 0001, James Cheng. 4827-4836 [doi]
- Seeing Through Ambiguity: Effective Video-guided Machine Translation via Chaotic Fusion and Causally Aligned Spatio-temporal AttentionJiawei Zheng, Feiyan Liu, Xiaoli Wang 0002. 4837-4845 [doi]
- AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP AdaptationQingqing Fang, Wenxi Lv, Qinliang Su. 4846-4855 [doi]
- AttriPrompt: Dynamic Prompt Composition Learning for CLIPQiqi Zhan, Shiwei Li, Qingjie Liu 0001, Yunhong Wang 0001. 4856-4865 [doi]
- SP-Mamba: Spatial-Perception State Space Model for Unsupervised Medical Anomaly DetectionRui Pan, Ruiying Lu. 4866-4874 [doi]
- The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic FrameworkFeiran Liu, Yuzhe Zhang, Xinyi Huang, Yinan Peng, Xinfeng Li, Lixu Wang, Yutong Shen, Ranjie Duan, Simeng Qin, Xiaojun Jia, Qingsong Wen, Wei Dong. 4875-4883 [doi]
- Simple but Effective: Sub-Volume Contrastive Learning for Class-Imbalanced Semi-Supervised 3D Medical Image SegmentationXianrun Xu, Baoyao Yang, Wanyun Li, Jingsong Lin, Yufei Xu. 4884-4893 [doi]
- Why is a Bird's Caption a Good Demonstration? Towards Effective Multimodal In-Context Learning without Dedicated DataJunlin Fang, Wenya Wang 0001, Lingli Zhang, Fengmao Lv. 4894-4903 [doi]
- h-space Based Adversarial Attack for Protection Against Few-shot PersonalizationXide Xu, Sandesh Kamath, Muhammad Atif Butt, Bogdan Raducanu. 4904-4913 [doi]
- Tree of Prompts: Aligning Hierarchical Visual Prior for Continual Generalized Category DiscoveryYiqing Hao, Yangru Huang, Yi Jin 0001, Tao Wang 0011, Yidong Li, Yigang Cen. 4914-4922 [doi]
- 3L: Curvature-Constrained Denoising Diffusion Model for 3D Lane DetectionWenxiang Liu, Yongkang Liu, Weiliang Meng, Gaoqi He, Jianhua Li. 4923-4931 [doi]
- Robust Single Image Sand Removal by Leveraging Uncertainty-aware SAM Priors and Prompt Learning with Refined Perceptual LossBingcai Wei, Hui Liu, Chuang Qian 0001, Zijian Li 0007, Wangyu Wu, Zijie Meng. 4932-4941 [doi]
- I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity LinkingZiyan Liu, Junwen Li, Kaiwen Li, Tong Ruan, Chao Wang 0095, Xinyan He, Zongyu Wang, Xuezhi Cao, JingPing Liu. 4942-4951 [doi]
- VaF-LangSplat: Voxel-Aware Fusion Language Gaussian SplattingChangzhou Li, Xinyu Yang 0001, Weiguo Yang, Xinyi Li. 4952-4961 [doi]
- DACA-Net: A Degradation-Aware Conditional Diffusion Network for Underwater Image EnhancementChang Huang, Jiahang Cao, Jun Ma 0008, Kieren Yu, Cong Li 0005, Huayong Yang, Kaishun Wu. 4962-4971 [doi]
- From Captions to Rewards (CaReVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language ModelsMuzhi Dai, Jiashuo Sun, Zhiyuan Zhao 0005, Shixuan Liu, Rui Li, Junyu Gao 0001, Xuelong Li 0001. 4972-4981 [doi]
- Image Captioning with Multimodal Guidance and Search Space OptimizationYimou Guo, Yaochen Li, Jingze Liu, Jiahui Feng, Haoyi Lou, Zhimin Chen, Yuan Gao, Yuanqi Su. 4982-4991 [doi]
- Towards Harmless Multimodal Assistants with Blind Preference OptimizationYongqi Li 0001, Lu Yang 0008, Jian Wang 0054, Runyang You, Wenjie Li 0002, Liqiang Nie. 4992-5000 [doi]
- 3: Benchmarking Chart Editing with Multimodal InstructionsDonglu Yang, Liang Zhang, Zihao Yue, Liangyu Chen 0008, Yichen Xu, Wenxuan Wang 0001, Qin Jin. 5001-5009 [doi]
- Transfer Attack for Bad and Good: Explain and Boost Adversarial Transferability across Multimodal Large Language ModelsHao Cheng 0015, Erjia Xiao, Jiayan Yang, Jinhao Duan, Yichi Wang 0002, Jiahang Cao, Qiang Zhang 0029, Le Yang 0007, Kaidi Xu, Jindong Gu, Renjing Xu. 5010-5019 [doi]
- LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural PlanningShibo Sun, Xue Li 0011, Donglin Di, Mingjie Wei, Lanshun Nie, Weinan Zhang 0003, Dechen Zhan, Yang Song 0001, Lei Fan 0007. 5020-5029 [doi]
- SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware AlignmentGuoxin Zang, Xue Li 0011, Donglin Di, Lanshun Nie, Dechen Zhan, Yang Song 0001, Lei Fan 0007. 5030-5039 [doi]
- VLMPlanner: Integrating Visual Language Models with Motion PlanningZhipeng Tang, Sha Zhang 0002, Jiajun Deng, Chenjie Wang, Guoliang You, Yuting Huang, Xinrui Lin, Yanyong Zhang. 5040-5049 [doi]
- Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram GenerationZhiqing Cui, Jiahao Yuan, Hanqing Wang, Yanshu Li, Chenxu Du, Zhenglong Ding. 5050-5059 [doi]
- RecipeRAG: Advancing Recipe Generation with Reinforced Retrieval Augmented GenerationJinghan Yang, Zhenbo Xu, Dehua Ma, Liu Liu, Fei Liu, Gong Huang, Zhaofeng He. 5060-5069 [doi]
- PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global CurriculumShiQi Zhang, Sha Zhang 0002, Jiajun Deng, Yedong Shen, Mingxiao Ma, Yanyong Zhang. 5070-5079 [doi]
- PatAug: Augmentation of Augmentation for Test-Time AdaptationXinyao Li, Dan Zhang, Zhekai Du, Lei Zhu 0002, Zhi Chen 0010, Jingjing Li 0001. 5080-5089 [doi]
- Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based PsychoanalysisXueqi Ma, Yanbei Jiang, Sarah M. Erfani, James Bailey 0001, Weifeng Liu 0001, Krista A. Ehinger, Jey Han Lau. 5090-5099 [doi]
- AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual GroundingYidan Wang, Chenyi Zhuang, Wutao Liu, Pan Gao 0001, Nicu Sebe. 5100-5109 [doi]
- Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of ExpertsYongXiang Hua, Haoyu Cao 0001, Zhou Tao, Bocheng Li, Zihao Wu, Chaohu Liu, Linli Xu. 5110-5119 [doi]
- Noise-Aware Decoding with Salient Region Enhancing for Zero-Shot Image CaptioningYuxin Xie, Dongyue Chen 0001, Yue Zhu, Tong Jia 0001, Shizhuo Deng. 5120-5129 [doi]
- Enhancing Multimodal In-Context Learning for Image Classification through Coreset OptimizationHuiyi Chen, Jiawei Peng, Kaihua Tang, Xin Geng 0001, Xu Yang 0021. 5130-5139 [doi]
- CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image RetrievalBin Kang, Bin Chen 0022, Junjie Wang, Yulin Li, Junzhi Zhao, Junle Wang, Zhuotao Tian. 5140-5149 [doi]
- Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot SegmentationJianming Liu, WenLong Qiu, Haitao Wei. 5150-5159 [doi]
- Formula Spotting Based on Synergy Perception and Representation MiningGang Pan 0002, Hongen Liu, Di Sun 0001. 5160-5168 [doi]
- Cross-View Geometric Collaboration for Generalizable Sparse View Neural Surface ReconstructionHang Yang, Le Hui, Jianjun Qian, Jian Yang 0003, Yigong Zhang, Jin Xie 0001. 5169-5177 [doi]
- Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge ConflictsWenju Sun, Qingyong Li, Wen Wang 0019, Yangliao Geng, Boyang Li 0001. 5178-5187 [doi]
- MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language ModelsYiyan Ji, Haoran Chen, Qiguang Chen, Chengyue Wu, Libo Qin 0001, Wanxiang Che. 5188-5197 [doi]
- DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution TransformationChangsheng Gao, Zijie Liu, Li Li 0040, Dong Liu 0002, Xiaoyan Sun 0001, Weisi Lin. 5198-5207 [doi]
- Causality-aligned Prompt Learning via Diffusion-based Counterfactual GenerationXinshu Li, Ruoyu Wang 0038, Erdun Gao, Mingming Gong, Lina Yao 0001. 5208-5217 [doi]
- DSDGF-Nutri: A Decoupled Self-Distillation Network with Gating Fusion For Food Nutritional AssessmentSujuan Hou, Zhihui Feng, Hao Xiong 0001, Weiqing Min, Peng Li, Shuqiang Jiang. 5218-5227 [doi]
- PriCAF: Privacy-Preserving Contribution Assessment in Federated Learning Before Model TrainingYixin Xu 0003, Hao Wu 0067, Jingzhou Zhu, Fengyuan Xu, Sheng Zhong 0002. 5228-5236 [doi]
- CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process ThinkingYuehao Huang, Liang Liu 0007, Shuangming Lei, Yukai Ma, Hao Su, Jianbiao Mei, Pengxiang Zhao, Yaqing Gu, Yong Liu 0007, Jiajun Lv. 5237-5246 [doi]
- Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image MatchingYafei Zhang, Yongle Shang, Huafeng Li 0001. 5247-5256 [doi]
- TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIPFan Li, Zanyi Wang, Zeyi Huang, Guang Dai, Jingdong Wang 0001, Mengmeng Wang 0005. 5257-5266 [doi]
- ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language ModelsYongheng Zhang 0001, Xu Liu, Ruihan Tao, Qiguang Chen, Hao Fei 0001, Wanxiang Che, Libo Qin 0001. 5267-5276 [doi]
- AFFIR: Dual-Modal Attention Feature Fusion for Scene Text Image RetargetingGang Pan 0002, Liming Pan, Hongze Mi, Rongyu Xiong, Jiahao Wang, Di Sun 0001. 5277-5285 [doi]
- Diffusion-Guided Knowledge Distillation for Weakly-Supervised Low-Light Semantic SegmentationChunyan Wang, Dong Zhang, Jinhui Tang 0001. 5286-5295 [doi]
- TAP: Parameter-efficient Task-Aware Prompting for Adverse Weather RemovalHanting Wang, Shengpeng Ji, Shulei Wang, Hai Huang 0013, Xiao Jin, Qifei Zhang, Tao Jin 0004. 5296-5305 [doi]
- DCount: Decoupled Spatial Perception and Attribute Discrimination for Referring Expression CountingMing Li, Yupeng Hu 0003, Yinwei Wei, Hao Liu 0072, Haocong Wang, Weili Guan. 5306-5315 [doi]
- Text-to-Image Generation with Multi-modal Knowledge Graph Construction and RetrievalJiawei Meng, Zhengmao Yang, Zhiqiang Liu, Shaokai Chen, Zhizhen Liu, Wen Zhang, Huajun Chen. 5316-5325 [doi]
- Towards Hazardous Activity Recognition for A Novel Real-World DatasetShehzad Ali, Md Tanvir Islam, Ik Hyun Lee, Mingfu Xiong, Minh-Son Dao, Saeed Anwar, Sambit Bakshi, Khan Muhammad 0001. 5326-5335 [doi]
- Evaluation of Egyptian Hieroglyph Classification Across Diverse Writing StylesMaksim Golyadkin, Valeria Rubanova, Aleksandr Utkov, Dmitry Nikolotov, Ilya Makarov. 5336-5344 [doi]
- I-C Attack: In-place and Cross-pixel Augmentations for Highly Transferable Transformation-based AttacksJiaming Liang, Chi-Man Pun. 5345-5354 [doi]
- MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text GenerationJinLan Fu, Shenzhen Huangfu, Hao Fei 0001, Yichong Huang, Xiaoyu Shen, Xipeng Qiu, See-Kiong Ng. 5355-5364 [doi]
- Illustration Layout Generation for Slide Enhancement with Pixel-based Diffusion ModelZhaoyun Jiang, Jiaqi Guo, Shakie Liu, Chao Han, Ting Liu 0002, Jian-Guang Lou, Dongmei Zhang 0001. 5365-5374 [doi]
- FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated LearningYubin Zheng, Pak-Hei Yeung, Jing Xia, Tianjie Ju, Peng Tang 0002, Weidong Qiu, Jagath C. Rajapakse. 5375-5384 [doi]
- OCR-Critic: Aligning Multimodal Large Language Models' Perception through Critical FeedbackQiuna Tan, Runqi Qiao, Guanting Dong, Yifan Zhang, Minhui Wu, Jiapeng Wang 0005, Miaoxuan Zhang, Yida Xu, Chong Sun, Chen Li 0031, Honggang Zhang. 5385-5393 [doi]
- Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural IntegrationYicheng Pan 0004, Zhenrong Zhang, Pengfei Hu 0006, Jiefeng Ma, Jun Du 0002, Jianshu Zhang 0001, Quan Liu, Jianqing Gao, Feng Ma. 5394-5403 [doi]
- Spatial-Frequency Mamba Collaborative Learning Network for Infrared Small Target DetectionYongji Li, Luping Wang 0002. 5404-5412 [doi]
- SDVPT: Semantic-Driven Visual Prompt Tuning for Open-world Object CountingYiming Zhao, Guorong Li, Laiyun Qing, Amin Beheshti, Jian Yang 0001, Quan Z. Sheng, Yuankai Qi, Qingming Huang. 5413-5421 [doi]
- Vector-Quantized Vision Foundation Models for Object-Centric LearningRongzhen Zhao, Vivienne Huiling Wang, Juho Kannala, Joni Pajarinen. 5422-5430 [doi]
- Towards Good Generalizations for Diffusion Generated Image Detection Using Multiple Reconstruction Contrastive LearningWanyi Zhuang, Qi Chu 0001, Tao Gong, Changtao Miao, Nenghai Yu. 5431-5440 [doi]
- GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image ParsingXianzhi Ma, Jianhui Li, Changhua Pei, Hao Liu 0034. 5441-5450 [doi]
- EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent ReasoningYijie Zhu, Yibo Lyu, Zitong Yu, Rui Shao 0001, Kaiyang Zhou, Liqiang Nie. 5451-5460 [doi]
- Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action RecognitionJihao Gu, Kun Li 0008, Fei Wang 0073, Yanyan Wei, Zhiliang Wu, Hehe Fan, Meng Wang 0001. 5461-5470 [doi]
- Multi-Task Gaze Communication UnderstandingCheng Peng 0016, Oya Çeliktutan. 5471-5479 [doi]
- InterMind: Doctor-Patient-Family Interactive Depression Assessment Empowered by Large Language ModelsZhiyuan Zhou, Jilong Liu, Sanwang Wang, Shijie Hao, Yanrong Guo, Richang Hong. 5480-5489 [doi]
- Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation DetectorsBing Wang 0018, Ximing Li 0002, Mengzhe Ye, Changchun Li, Bo Fu 0001, Jianfeng Qu, Lin Yuanbo Wu. 5490-5498 [doi]
- From Subtle Hints to Grand Expressions - Mastering Fine-grained Emotions with Dynamic Multimodal AnalysisQinfu Xu, Liyuan Pan, Shaozu Yuan, Yiwei Wei, Chunlei Wu. 5499-5508 [doi]
- Sera: Separated Coarse-to-fine Representation Alignment for Cross-subject EEG-based Emotion RecognitionZhihao Jia, Meiyan Xu, Jingyuan Wang, Ziyu Jia, Yong Li 0032, Xinliang Zhou, Chenyu Liu, Junfeng Yao, Yi Ding 0012. 5509-5518 [doi]
- HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution LearningChuhang Zheng, Chunwei Tian, Jie Wen 0001, Daoqiang Zhang, Qi Zhu 0001. 5519-5527 [doi]
- Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion ReasoningZhiyuan Han, Beier Zhu, Yanlong Xu, Peipei Song, Xun Yang 0001. 5528-5537 [doi]
- Unsupervised Dual-Domain Memory Model for Time Series Anomaly DetectionMingle Zhou, Xingli Wang, Jiachen Li, Delong Han, Gang Li 0005. 5538-5546 [doi]
- VAEmo: Efficient Representation Learning for Visual-Audio Emotion With Knowledge InjectionHao Cheng, Zhiwei Zhao, Yichao He, Zhenzhen Hu 0004, Jia Li 0013, Meng Wang 0001, Richang Hong. 5547-5556 [doi]
- AStF: Motion Style Tranfer via Adaptive Statistics FusorHanMo Chen, Chenghao Xu, Jiexi Yan, Cheng Deng 0002. 5557-5566 [doi]
- Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go BeyondHuiyu Zhai, Xingxing Yang 0002, Yalan Ye, Chenyang Li, Bin Fan, Changze Li. 5567-5576 [doi]
- BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural EmbeddingsDongyang Li, Haoyang Qin, Mingyang Wu, Chen Wei 0006, Quanying Liu. 5577-5586 [doi]
- Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust OptimizationFeng-Qi Cui, Anyang Tong, Jinyang Huang, Jie Zhang 0073, Dan Guo 0001, Zhi Liu 0002, Meng Wang 0001. 5587-5596 [doi]
- Action Unit Enhance Dynamic Facial Expression RecognitionFeng Liu, Lingna Gu, Chen Shi, Xiaolan Fu. 5597-5606 [doi]
- ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion ModelCheng Luo, Siyang Song, Siyuan Yan, Zhen Yu, ZongYuan Ge. 5607-5616 [doi]
- NaME: A Natural Micro-expression Dataset for Micro-expression Recognition in the WildJiateng Liu, Hengcan Shi, Haiwen Liang, Xiaolin Xu, Yuan Zong, Yaonan Wang 0001, Wenming Zheng. 5617-5626 [doi]
- Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS FusionSha Zhao, Song Yi, Yangxuan Zhou, Jiadong Pan, Jiquan Wang, Jie Xia, Shijian Li, Shurong Dong, Gang Pan 0001. 5627-5636 [doi]
- Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and BaselineYuanchen Shi, Fang Kong 0001, Longyin Zhang. 5637-5646 [doi]
- Human vs AI: How Digital Human News Anchors Affect Our Cognitive Processes?Yan-Kai Liu, Shunyang Yao, Tao Xi, Bao-Liang Lu, Wei-Long Zheng. 5647-5656 [doi]
- CCDb+: Enhanced Annotations and Multi-Modal Benchmark for Natural Dyadic ConversationsYang Deng, Yu-Kun Lai, Paul L. Rosin. 5657-5666 [doi]
- Grounding Emotion Recognition with Visual Prototypes: VEGA - Revisiting CLIP in MERCGuanyu Hu 0003, Dimitrios Kollias, Xinyu Yang 0001. 5667-5676 [doi]
- FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and ReasoningZhuozhao Hu, Kaishen Yuan, Xin Liu 0012, Zitong Yu, Yuan Zong, Jingang Shi, Huanjing Yue, Jingyu Yang. 5677-5686 [doi]
- MoCERNet: A Modality-Complete Modeling Framework for Emotion Recognition in Physiological Signals under Imperfect Modal MatchingTianzuo Xin, Jing Wang 0060, Xiyuan Jin, Xiaojun Ning 0001, Zhiyang Feng, Youfang Lin. 5687-5696 [doi]
- Real-Time EEG Emotion Recognition from Dynamic Mixed Spatiotemporal Graph LearningYue Pan, Cunbo Li, Peiyang Li, Fali Li, Feng Wan 0003, Dezhong Yao 0001, Zehong Cao, Peng Xu 0001. 5697-5706 [doi]
- DEEMO: De-identity Multimodal Emotion Recognition and ReasoningDeng Li 0002, Bohao Xing, Xin Liu 0012, Baiqiang Xia, Bihan Wen, Heikki Kälviäinen. 5707-5716 [doi]
- Multimodal Emotion Recognition with Missing Modality via a Unified Multi-task Pre-training FrameworkZiyi Li 0003, Wei-Long Zheng, Bao-Liang Lu. 5717-5725 [doi]
- Robust Understanding of Human-robot Social Interactions through Multimodal DistillationTongfei Bian, Mathieu Chollet, Tanaya Guha. 5726-5734 [doi]
- EmoDETective: Detecting, Exploring, and Thinking Emotional Cause in VideosXuandong Huang, Yuzhe Zhou, Jiashu Li, Shiqian Lu, Shangfei Wang. 5735-5744 [doi]
- Toward Reliable Emotion Recognition: Alleviating Label Noise and Reducing Uncertain PredictionChengzhe Wang, Wenqing Ji, Chenyang Li, Tongjie Pan, Yalan Ye. 5745-5754 [doi]
- Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing ModalitiesRui Liu 0008, Haolin Zuo, Zheng Lian 0004, Hongyu Yuan, Qi Fan. 5755-5764 [doi]
- LES-CLIP: A Lightweight Emotion-Sensitive Adaptation of CLIP for Precise Similar Emotion DiscriminationXiao Fu, Pengyu Wang, Wei Xi 0003, Kun Zhao 0002, Jiadong Feng, Jizhong Zhao. 5765-5774 [doi]
- Emotion across Modalities and Cultures: Multilingual Multimodal Emotion-Cause Analysis with Memory-inspired FrameworkDan Wu, Xincheng Ju, Dong Zhang 0013, Shoushan Li, Erik Cambria, Guodong Zhou. 5775-5783 [doi]
- Emotion in a Bottle: Information Bottleneck Guided Disentanglement for Emotion Domain AdaptationJiankun Zhu, Sicheng Zhao, Lulu Tian, Jing Jiang, Xi Chen 0110, Hongxun Yao. 5784-5793 [doi]
- MGHFT: Multi-Granularity Hierarchical Fusion Transformer for Cross-Modal Sticker Emotion RecognitionJian Chen 0011, Yuxuan Hu 0005, Haifeng Lu, Wei Wang 0077, Min Yang 0007, Chengming Li 0004, Xiping Hu. 5794-5803 [doi]
- Smooth Online Multiple Appropriate Facial Reaction GenerationWeicheng Xie 0001, Chunlin Yan, Siyang Song, Zitong Yu, LinLin Shen, Laizhong Cui. 5804-5813 [doi]
- Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning BenchmarkJinpeng Hu, Hongchang Shi, Chongyuan Dai, Zhuo Li, Peipei Song, Meng Wang 0001. 5814-5823 [doi]
- MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of MindZheng Zhang, Nuoqian Xiao, Qi Chai, Deheng Ye, Hao Wang 0094. 5824-5833 [doi]
- EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion RecognitionQile Liu, Weishan Ye, Lingli Zhang, Zhen Liang. 5834-5842 [doi]
- SE2E: Recognizing Emotion behind Societal BehaviorWending Xiong, Ruimin Hu, Lingfei Ren, Xixi Li, Dengshi Li. 5843-5852 [doi]
- TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud DetectionZhiming Ma, Peidong Wang, Minhua Huang, Jinpeng Wang, Kai Wu, Xiangzhao Lv, Yachun Pang, Yin Yang, Wenjie Tang, Yuchen Kang. 5853-5862 [doi]
- Regulatory Focus Theory Induced Micro-Expression Analysis with Structured Representation LearningBohao Zhang, Haoxin Xu, Jingzhong Lin, Changbo Wang, Gaoqi He. 5863-5872 [doi]
- Multi-Information Hierarchical Fusion Transformer with Local Alignment and Global Correlation for Micro-Expression RecognitionJinsheng Wei, Jialiang Sun, Guanming Lu, Jingjie Yan, Dong Zhang. 5873-5882 [doi]
- Pretraining Large Brain Language Model for Active BCI: Silent SpeechJinzhao Zhou, Zehong Cao, Yiqun Duan, Connor Barkley, Daniel Leong, Xiaowei Jiang, Quoc Toan Nguyen, Ziyi Zhao, Thomas Do, Yu-Cheng Chang, Sheng-Fu Liang, Chin-Teng Lin. 5883-5892 [doi]
- DDSE: A Decoupled Dual-Stream Enhanced Framework for Multimodal Sentiment Analysis with Text-Centric SSMShenjie Jiang, Zhuoyu Wang, Xuecheng Wu, Hongru Ji, Mingxin Li, Xianghua Li, Chao Gao 0001. 5893-5902 [doi]
- From Individuals to Crowds: Dual-Level Public Response Prediction in Social MediaJinghui Zhang, Kaiyang Wan, Longwei Xu, Ao Li, Zongfang Liu, Xiuying Chen. 5903-5912 [doi]
- Crowd Dynamics Demand Adaptivity: Self-Adaptive Physics-Informed Neural Network for Crowd SimulationZiying Tan, Linbo Luo 0001, Haiyan Yin, Yew-Soon Ong, Wentong Cai 0001. 5913-5921 [doi]
- Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text RetrievalYifan Wang, Tao Wang 0053, Chenwei Tang, Caiyang Yu, Zhengqing Zang, Mengmi Zhang, Shudong Huang, Jiancheng Lv 0001. 5922-5931 [doi]
- Unsupervised Similarity-Fusion Transformer Hashing for Multimodal RetrievalZhan Yang 0001, Binghong Chen, Jiajun Tang, Yinan Li. 5932-5941 [doi]
- Cluster-Aware Contrastive Multi-View Clustering Based on Masked ViewsPenglei Wang, Ziming Quan, Danyang Wu, Jin Xu. 5942-5950 [doi]
- SaP-Bot: A Multimodal Large-Language Model for End-to-End Same-Product IdentificationYixuan Zhou 0001, Yulu Tian, Wenliang Zhong, Xingbin Yu, Heng Tao Shen, Xing Xu 0001. 5951-5960 [doi]
- Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware RefinementBeibei Zhang 0005, Yanan Lu, Ruobing Xie, Zongyi Li, Siyuan Xing, Tongwei Ren, Fen Lin. 5970-5978 [doi]
- IM-POI: Bridging ID and Multi-modal Gaps in Next POI RecommendationSiyuan Huang, Jiahui Jin, Xin Lin, Xigang Sun, Yukun Ban. 5979-5987 [doi]
- Graph-based Approximate Nearest Neighbor Search by Deep Reinforcement RoutingMingjie Li 0004, Junhao Lin, Dian Ouyang, Ying Zhang 0001, Wei Wang 0011. 5988-5997 [doi]
- TAMER: Interest Tree Augmented Modality Graph Recommender for Multimodal RecommendationFanshen Meng, Zhenhua Meng, Ru Jin, Yuli Chen 0001, Rongheng Lin, Budan Wu. 5998-6006 [doi]
- Generating Negative Samples for Multi-Modal RecommendationYanbiao Ji, Dan Luo 0004, Chang Liu 0078, Shaokai Wu, Jing Tong, Qichen He, Deyi Ji, Hongtao Lu 0001, Yue Ding 0001. 6007-6016 [doi]
- Topic Guided Multi-faceted Semantic Disentanglement for CTR predictionFengxin Li, Zhiqian Yin, Hongyan Liu 0002, Jingcai Guo, Jun He 0008, Yi Li, Chao Zhou, Jun Zhang, Haijie Gu. 6017-6026 [doi]
- Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment RetrievalJunan Lin, Daizong Liu, Xianke Chen, Xiaoye Qu, Xun Yang 0001, Jixiang Zhu, Sanyuan Zhang, Jianfeng Dong. 6027-6036 [doi]
- Dual Uncertainty-Guided Feature Alignment Learning for Text-Based Person RetrievalYufei Zheng, Jiawei Liu 0001, Bingyu Hu, Zikun Wei, Yong Wu, Zheng-Jun Zha. 6037-6046 [doi]
- Unsupervised Cross-Modal Person Search via Progressive Diverse Text GenerationFeng Chen, Jielong He, Yang Liu, Heng Liu, Zhe Chen, Yaxiong Wang. 6047-6056 [doi]
- PLGeo: A Patch-level Framework to Overcome Orientation Discrepancies in Cross-view Geo-localizationYiru Li, Yingying Zhu 0001. 6057-6065 [doi]
- Factorized Transformer Hashing with Adaptive Routing for Large-scale Image RetrievalYadong Huo, Qibing Qin, Wenfeng Zhang, Lei Huang 0010, Jie Nie. 6066-6074 [doi]
- Prototype-Guided Representation Projection for Multi-Domain Multi-Task RecommendationBinrui Wu, Haochen Sui, Jiaye Lin, Jiechao Gao, Ting Xu, Keyan Jin, Xuesong Zhang. 6075-6083 [doi]
- MedAlign: Enhancing Combinatorial Medication Recommendation with Multi-modality AlignmentHang Lv 0010, Zixuan Guo, Zijie Wu, Yanchao Tan, Guofang Ma, Zhigang Lin, Xiping Chen, Hong Cheng 0001, Carl Yang 0001. 6084-6092 [doi]
- Unveiling the Impact of Multi-modal Content in Multi-modal Recommender SystemsGuipeng Xv, Xinyu Li, Yi Liu 0071, Chen Lin 0001, Xiaoli Wang 0002. 6093-6102 [doi]
- LLM-Grounded Diffusion for Cross-Domain RecommendationKuan Liu, Ke Wang, Ji Zhang, Gang Zhou. 6103-6112 [doi]
- OFFSET: Segmentation-based Focus Shift Revision for Composed Image RetrievalZhiwei Chen, Yupeng Hu 0003, Zixu Li 0001, Zhiheng Fu, Xuemeng Song, Liqiang Nie. 6113-6122 [doi]
- Lightweight Relational Proposal Network with Dual-Branch Distillation for Video Moment RetrievalYujia Zhu, Hao Yang, Yibo Zhao 0001, Chunjie Ma, Weili Guan, Zan Gao. 6123-6132 [doi]
- 3-MRec: Invariant Learning with Information Bottleneck for Incomplete Modality RecommendationHuilin Chen 0002, MiaoMiao Cai, Fan Liu 0008, Zhiyong Cheng 0001, Richang Hong, Meng Wang 0001. 6133-6142 [doi]
- HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video RetrievalZhiwei Chen, Yupeng Hu 0003, Zixu Li 0001, Zhiheng Fu, Haokun Wen, Weili Guan. 6143-6152 [doi]
- Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal RetrievalZhengxin Pan, Haishuai Wang, Fangyu Wu, Peng Zhang 0001, Jiajun Bu. 6153-6162 [doi]
- DeCoRec: Decoupled Collaborative Refinement for Multi-Modal Sequential RecommendationsZhaoqi Chen, Wanni Xu, Yunfeng Zhang, Yawei Hou, Zhenyu Wen, Cong Wang. 6163-6172 [doi]
- Learning Partially-Decorrelated Common Spaces for Ad-hoc Video SearchFan Hu, Zijie Xin, Xirong Li 0001. 6173-6182 [doi]
- Open3DSearch: Zero-Shot Precise Retrieval of 3D Shapes Using Text DescriptionsXiong Li, Yikang Yan, Zhenyu Wen, Qin Yuan 0001, Fangda Guo, Zhen Hong, Ye Yuan 0001. 6183-6192 [doi]
- FITMM: Adaptive Frequency-Aware Multimodal Recommendation via Information-Theoretic Representation LearningWei Yang 0034, Rui Zhong 0003, Yiqun Chen 0004, Shixuan Li, Heng Ping, Chi Lu, Peng Jiang 0002. 6193-6202 [doi]
- Boosting Guided Diffusion with Large Language Models for Multimodal Sequential RecommendationTe Song, Lianyong Qi, Weiming Liu 0005, Fan Wang 0020, Xiaolong Xu 0001, Hongsheng Hu, Yang Cao 0019, Xuyun Zhang, Amin Beheshti. 6203-6212 [doi]
- Decoupled Identity and Attribute Tokenization for Person Re-IdentificationRui Shang, Min Liu 0033, Xueping Wang, Yuan Bian 0002, Yaonan Wang 0001. 6213-6222 [doi]
- Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe RetrievalQing Wang, Chong-Wah Ngo, Yu Cao, Ee-Peng Lim. 6223-6231 [doi]
- Dual-Phase Playtime-guided Recommendation: Interest Intensity Exploration and Multimodal Random WalksJingmao Zhang, Zhiting Zhao, Yunqi Lin, Jianghong Ma, Tianjun Wei, Haijun Zhang 0002, Xiaofeng Zhang 0002. 6232-6241 [doi]
- LEGO: A Lightweight and Efficient Multiple-Attribute Unlearning Framework for Recommender SystemsFengyuan Yu, Yuyuan Li 0001, Xiaohua Feng, Junjie Fang, Tao Wang, Chaochao Chen. 6242-6251 [doi]
- CHORD: Customizing Hybrid-precision On-device Model for Sequential Recommendation with Device-cloud CollaborationTianqi Liu, Kairui Fu, Shengyu Zhang 0001, Wenyan Fan, Zhaocheng Du, Jieming Zhu, Fan Wu 0006, Fei Wu 0001. 6252-6261 [doi]
- DiSCo: Disentangled Attribute Manipulation Retrieval via Semantic Reconstruction and Consistency RegularizationMin Tan 0005, Guanhao Liu, Huijing Zhan, Yuyu Yin, Zhou Yu 0001, Jiajun Ding, Yinfu Feng. 6262-6270 [doi]
- Online Cross-Modal Hashing with Multi-Level MemoryWentao Fan 0003, Chao Zhang 0078, Chunlin Chen 0001, Huaxiong Li. 6271-6279 [doi]
- DiffTMR: Diffusion-based Hierarchical Alignment for Text-Molecule RetrievalChenxu Wang, Dong Zhou 0001, Ting Liu, Jianghao Lin, Yongmei Zhou, Aimin Yang 0002. 6280-6288 [doi]
- Towards Temporal-Aware Multi-Modal Retrieval Augemented Generation in FinanceFengbin Zhu, Junfeng Li, Liangming Pan, Wenjie Wang 0007, Fuli Feng, Chao Wang 0049, Huanbo Luan, Tat-Seng Chua. 6289-6297 [doi]
- Flip is Better than Noise: Unbiased Interest Generation for Multimedia RecommendationYue He, Jingxi Xie, Fengling Li 0001, Lei Zhu 0002, Jingjing Li 0001. 6298-6306 [doi]
- MIRA: A Novel Framework for Fusing Modalities in Medical RAGJinhong Wang, Tajamul Ashraf, Zongyan Han, Jorma Laaksonen, Rao Muhammad Anwer. 6307-6315 [doi]
- Refining Contrastive Learning and Homography Relations for Multi-Modal RecommendationShouxing Ma, Yawen Zeng, Shiqing Wu 0001, Guandong Xu. 6316-6324 [doi]
- The Best is Yet to Come: Graph Convolution in the Testing Phase for Multimodal RecommendationJinfeng Xu 0003, Zheyu Chen 0003, Shuo Yang 0011, Jinze Li 0001, Edith C. H. Ngai. 6325-6334 [doi]
- VibeSpace: Automatic Generation of Data and Vector Embeddings for Arbitrary Domains and Cross-domain Mappings using LLMsKipp Freud, Daniel Collins, Delmiro D. Sampaio Neto, Grant Stevens. 6335-6342 [doi]
- Label Prediction Inherited Hashing for Cross-Modal Retrieval: Applying Supervised Hashing to Unsupervised TasksKaihang Jiang, Wai-Keung Wong, Jianyang Qin, Xiaozhao Fang, Jie Wen 0001, Bingzhi Chen, Hongbo Gao 0001. 6343-6352 [doi]
- Asymmetric Pre-aligned Anchor Contrastive Enhanced Diffusion Hashing Model for Incomplete Multimodal RetrievalYang Yu, MeiYu Liang, Wei Huang, Juncheng Zheng, Kangkang Lu 0002, Yawen Li 0001, Junping Du 0001, Zhe Xue, Wu Liu. 6353-6362 [doi]
- DMMD4SR: Diffusion Model-based Multi-level Multimodal Denoising for Sequential RecommendationWeihai Lu, Li Yin. 6363-6372 [doi]
- Contrastive Prototype Framework for Calibrating Video RecommendationFan Li, Jiazhen Huang, Shisong Tang, Bing Han, Huafeng Cao, Haochen Sui, Ting Xu, Xiaoyu Kang. 6373-6382 [doi]
- ShieldIR: Privacy-Preserving Unsupervised Cross-Domain Image Retrieval via Dual Protection TransformationZixin Tang, Haihui Fan, Jinchao Zhang 0002, Hui Ma 0002, Xiaoyan Gu 0001, Bo Li 0063, Weiping Wang 0005. 6383-6392 [doi]
- Deep Probabilistic Binary Embedding via Learning Reliable Uncertainty for Cross-Modal RetrievalKun Cheng, Qibing Qin, Wenfeng Zhang, Lei Huang 0010, Jie Nie. 6393-6402 [doi]
- Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated VideosHaowen Gao, Liang Pang 0001, Shicheng Xu, Leigang Qu, Tat-Seng Chua, Huawei Shen, Xueqi Cheng. 6403-6411 [doi]
- Parameter-Efficient Variational AutoEncoder for Multimodal Multi-Interest RecommendationNhu-Thuat Tran, Hady W. Lauw. 6412-6420 [doi]
- Multi-Domain Enhancement via Residual Interwoven Transfer in Cross-Domain Sequential RecommendationQingtian Bian, Tieying Li, Marcus Vinícius de Carvalho, Jiaxing Xu, Hui Fang 0002, Yiping Ke. 6421-6430 [doi]
- Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic RecommendationYuli Liu, Wenjun Kong, Weizhi Ma, Cheng Luo. 6431-6440 [doi]
- When Headlines Meet Minds: Empowering News Recommendations with Social SimulatorYanwei Xie, Weizhi Nie, Lanjun Wang, Hongshuo Tian, Changtai Shi, An-An Liu. 6441-6450 [doi]
- Leveraging Multimodal Data and Side Users for Diffusion Cross-Domain RecommendationFan Zhang, Jinpeng Chen 0001, Huan Li 0003, Senzhang Wang, Yuan Cao 0003, Kaimin Wei, Jianxiang He, Feifei Kou, Jinqing Wang. 6451-6460 [doi]
- AEMVC: Mitigate Imbalanced Embedding Space in Multi-view ClusteringPengyuan Li 0013, Man Liu 0003, Dongxia Chang, Yiming Wang 0007, Zisen Kong, Yao Zhao 0001. 6461-6470 [doi]
- Query-Focused Multimodal Summarization with Gate-Guided Mixture-of-ExpertsJiajun Han, Xuran Yang, Hui Zhang. 6471-6480 [doi]
- AIGC-Enhanced UAV-Based 3D Mapping and Trajectory Planning for Rapid Disaster ResponseXiaohang Zhang, Hui Gao 0002, Bo Zhang 0032, Xiao Chen, Kun Niu, Tan Yang, Wufan Wang, Wendong Wang 0003. 6481-6489 [doi]
- A Comprehensive Benchmark for Electrocardiogram Time-SeriesZhijiang Tang, Jiaxin Qi, Yuhua Zheng, Jianqiang Huang 0001. 6490-6499 [doi]
- GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian SplattingLei Yao, Yi Wang 0068, Yi Zhang, Moyun Liu, Lap-Pui Chau. 6500-6509 [doi]
- HoloTrace: LLM-based Bidirectional Causal Knowledge Graph for Edge-Cloud Video Anomaly DetectionHanling Wang, Qing Li 0006, Li Chen 0008, Haidong Kang, Fei Ma 0006, Yong Jiang 0001. 6510-6519 [doi]
- Multi-faceted Complementary Learning for Incomplete Multi-view Multi-label ClassificationXinyu Xiao, Peixi Peng, Qiang Wang 0022, Chao Xing, Shuhan Qi. 6520-6529 [doi]
- Probabilistic Mixture of Hyperbolic Mamba for Few-Shot Class-Incremental LearningYawen Cui, Wenbin Zou, Huiping Zhuang, Yi Wang 0068, Lap-Pui Chau. 6530-6539 [doi]
- GTHNA: Local-global Graph Transformer with Memory Reconstruction for Holistic Node Anomaly EvaluationMingkang Li 0005, Xuexiong Luo, Yue Zhang, Yaoyang Li, Fu Lin. 6540-6548 [doi]
- VSumMamba: Mamba Empowered Efficient Video Summarization with Multi-Scale Spatial-Temporal ModelingYamiao Ding, Tianrui Liu 0001, Zhizhou Lu, Jun-Jie Huang, Wentao Zhao, Xinwang Liu 0002, Meng Wang 0001. 6549-6557 [doi]
- Aligned or Apart? Multi-Agent Insights into Consumer and Brand Messaging DiscrepanciesHaotian Gan, Yudong Li 0001, Wanyue Li, Weidong Tang. 6558-6566 [doi]
- OmniDoctor: Towards LLM-centric Lifelong Learning for New Emerging Medical VQA TasksNa Jiang, Wenhui Zheng, Xuqian Gu, Jingjing Wang. 6567-6575 [doi]
- ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific ExperimentsJiali Chen, Yujie Jia, Zihan Wu, Jinyu Yang, Jianpeng Chen, Xusen Hei, Jiayuan Xie, Yi Cai 0001, Qing Li 0001. 6576-6585 [doi]
- Skynet-V1: Towards Early Warning of Video Abnormal Events via A Spatial-temporal Causal-enhanced MoE FrameworkJunxiao Ma, Jingjing Wang, Min Zhang, Guodong Zhou. 6586-6595 [doi]
- SD-VSum: A Method and Dataset for Script-Driven Video SummarizationManolis Mylonas, Evlampios Apostolidis, Vasileios Mezaris. 6596-6604 [doi]
- Where Watermark Meets Beauty: Expert-Guided Aesthetic Visible Watermarking for Digital ArtworksChangjuan Ran, Fang Liu 0002, Runqi Fang, Xiangyu Meng, Shenglan Cui, Yunfan Ye. 6605-6614 [doi]
- Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation LearningJiayun Hu, Yueyi He, Tianyi Liang 0002, Changbo Wang, Chenhui Li 0001. 6615-6624 [doi]
- PPJudge: Towards Human-Aligned Assessment of Artistic Painting ProcessShiqi Jiang 0001, Xinpeng Li 0002, Xi Mao, Changbo Wang, Chenhui Li 0001. 6625-6633 [doi]
- Multimodal LLMs Can Reason about Aesthetics in Zero-ShotRuixiang Jiang, Chang Wen Chen. 6634-6643 [doi]
- DA-Font: Few-Shot Font Generation via Dual-Attention Hybrid IntegrationWeiran Chen 0001, Guiqian Zhu, Ying Li 0065, Yi Ji 0001, Chunping Liu. 6644-6653 [doi]
- ArtFRD: A Fisher-Rao Mixture Metric for Generative Model Aesthetic EvaluationChuanwei Huang, Zexi Jia, Hongyan Fei, Yeshuang Zhu, Zhiqiang Yuan, Jinchao Zhang 0001, Jie Zhou 0016. 6654-6662 [doi]
- CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence DecompositionKaiXing Yang, Xulong Tang, Haoyu Wu, Biao Qin, Hongyan Liu 0002, Jun He 0008, Zhaoxin Fan. 6663-6671 [doi]
- Multi-Modal Semantic Parsing for the Interpretation of Tombstone InscriptionsXiao Zhang, Johan Bos. 6672-6681 [doi]
- AnimeColor: Reference-based Animation Colorization with Diffusion TransformersYuhong Zhang, Liyao Wang, Han Wang, Danni Wu, Zuzeng Lin, Feng Wang 0015, Li Song 0001. 6682-6690 [doi]
- Infusing AI Art with Cultural Authenticity Through the Culture-Specific LoRAZuona Chen, James She. 6691-6699 [doi]
- ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art UnderstandingShuai Wang 0054, Ivona Najdenkoska, Hongyi Zhu, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring. 6700-6709 [doi]
- CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language ModelWei Zhang 0219, Wong Kam-Kwai, Biying Xu, Yiwen Ren, Yuhuai Li, Yingchaojie Feng, Minfeng Zhu 0001, Wei Chen 0001. 6710-6719 [doi]
- HarmoniVox: Painting Voices to Match the Avatar's SoulSongtao Zhou, Xiaoyu Qin, Yixuan Zhou 0002, Qixin Wang 0002, Zeyu Jin, Zixuan Wang 0026, Zhiyong Wu 0001, Jia Jia 0001. 6720-6729 [doi]
- Kai Shu CalligraphyTiancheng Liu, Jiayi Ye, Shumeng Zhang, Kang Zhang 0001, Chen Liang. 6730-6739 [doi]
- SimViews: An Interactive Multi-Agent System Simulating Visitor-to-Visitor Conversational Patterns to Present Diverse Perspectives of Artifacts in Virtual MuseumsMingyang Su, Chao Liu, Jingling Zhang, Shuang Wu, Mingming Fan 0001. 6740-6750 [doi]
- 2: Visual Question Answering for Video Quality AssessmentZiheng Jia, Zicheng Zhang, Jiaying Qian, Haoning Wu 0001, Wei Sun 0029, Chunyi Li, Xiaohong Liu 0001, Weisi Lin, Guangtao Zhai, Xiongkuo Min. 6751-6760 [doi]
- Towards Explainable Partial-AIGC Image Quality AssessmentJiaying Qian, Ziheng Jia, Zicheng Zhang, Zeyu Zhang, Guangtao Zhai, Xiongkuo Min. 6761-6770 [doi]
- Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation MetricZhichao Zhang, Wei Sun 0029, Xinyue Li 0001, Yunhao Li, Qihang Ge, Jun Jia, Zicheng Zhang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai. 6771-6780 [doi]
- Decoupled Motion Prediction for Real-time G-buffer Free Frame ExtrapolationJiawei Zhang, Haonan Zhang, Weitao Zhang, Liang Pu, Zesen Feng, Jie Guo 0001. 6781-6790 [doi]
- Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving DistillationWenhao Li, Xiu Su, Jingyi Wu, Feng Yang, Yang Liu, Yi Chen, Shan You, Chang Xu 0002. 6791-6800 [doi]
- DeLoad: Demand-Driven Short-Video Preloading with Scalable Watch-Time EstimationTong Liu, Zhiwei Fan, Guanyan Peng, Haodan Zhang, Yucheng Zhang, Zhen Wang, Pengjin Xie, Liang Liu 0001. 6801-6809 [doi]
- Meta-Illustrator: Transferring Illustrations from 2D Interactive Image Space to 3D Immersive Exploration SpaceRichen Liu, Lingyu Sun, Xuefeng Huang, Yiran Li, Jiang Zhang, Siru Chen, Zhouhao Wu, Ayush Kumar, Chufan Lai. 6810-6819 [doi]
- GeoQE: Enhancing Quality of Experience in Point Cloud StreamingJunzhe Zhang, Chengfeng Han, Dandan Ding, Zhan Ma 0001. 6820-6829 [doi]
- InstructCrop: Teaching Multimodal Large Language Models to Crop Aesthetic ImagesXiangfei Sheng, Pangu Xie, Weidong Zou, Pengfei Chen 0003, Tong Zhu 0003, Leida Li. 6830-6839 [doi]
- PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion GenerationSihan Zhao, Zixuan Wang 0026, Tianyu Luan, Jia Jia 0001, Wentao Zhu, Jiebo Luo 0001, Junsong Yuan 0001, Nan Xi. 6840-6849 [doi]
- HandSolo: A Mid-Air Hand Pose Interaction Method Based on Disentangled Degrees-of-Hand-FreedomSongpei Xu, Xuri Ge, Chaitanya Kaul, Roderick Murray-Smith. 6850-6858 [doi]
- Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only InferenceYitong Zhu, Zhuowen Liang, Yiming Wu, Tangyao Li, Yuyang Wang. 6859-6867 [doi]
- DARL: Mitigating Gradient Conflicts in Long-Tailed Out-of-Distribution LearningXuan Zhang, Sin Chee Chin, Jing-Hao Xue, Xiaochen Yang, Wenming Yang. 6868-6877 [doi]
- PG-Agent: An Agent Powered by Page GraphWeizhi Chen, Ziwei Wang, Leyang Yang, Sheng Zhou 0004, Xiaoxuan Tang, Jiajun Bu, Yong Li 0004, Wei Jiang 0011. 6878-6887 [doi]
- RTR-GS: 3D Gaussian Splatting for Inverse Rendering with Radiance Transfer and ReflectionYongyang Zhou, Fanglue Zhang, Zichen Wang, Lei Zhang. 6888-6897 [doi]
- Graph-Perceptron with Semantic Fidelity for No-Reference Super-Resolution Image Quality AssessmentLei Chen. 6898-6907 [doi]
- LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMsZitong Xu, Huiyu Duan, Bingnan Liu, Guangji Ma, Jiarui Wang, Liu Yang, Shiqi Gao, Xiaoyu Wang, Jia Wang 0004, Xiongkuo Min, Guangtao Zhai, Weisi Lin. 6908-6917 [doi]
- Bring the VibeOn: Designing a Multimodal Interface for Shared Emotional Experiences in Live-streamed ConcertsGyeongjin Kim, Sebin Lee, Daye Kim, Jungjin Lee, Minju Kim. 6918-6927 [doi]
- FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality AssessmentSijing Wu, Yunhao Li, Ziwen Xu, Yixuan Gao, Huiyu Duan, Wei Sun 0029, Guangtao Zhai. 6928-6937 [doi]
- FedRog: Robust Federated Graph Classification for Strong Heterogeneity and High-Noise ScenariosDe-li, Zhou Tan, Qiyu Li, Zeming Gan, Tiange Xia, Jinyan Wang, Xianxian Li. 6938-6947 [doi]
- Multi-Dimensional Text-to-Face Image Quality Assessment Using LLM: Database and MethodYixuan Gao, Xiongkuo Min, Jinliang Han, Yuqin Cao, Sijing Wu, Yunze Dou, Guangtao Zhai. 6948-6957 [doi]
- Text-Visual Semantic Constrained AI-Generated Image Quality AssessmentQiang Li, Qingsen Yan, Haojian Huang, Peng Wu 0015, Haokui Zhang, Yanning Zhang 0001. 6958-6966 [doi]
- Excavating the Most Critical Gaussians: Sparse Selection and Structural Optimization for Efficient 3DGS CompressionYang Hu, Jingui Ma, Yucheng Yang, Jie Liang, Jinbo Yan, Jiahao Wu, Jiayu Yang, Yang Deng, Ronggang Wang. 6967-6976 [doi]
- DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram ReconstructionCunhang Fan, Sheng Zhang, Jingjing Zhang, Enrui Liu, Xinhui Li, Gangming Zhao, Zhao Lv. 6977-6985 [doi]
- CLIP-MT: Multi-Modal Knowledge-Driven Adaptive Scale Feature Allocation for Multi-Task Dense PredictionShalayiding Sirejiding, Yue Ding 0001, Yuxiang Lu, Xinyi Hou, Shaokai Wu, Qichen He, Chunlin Wang, Wenqiang Guo, Hongtao Lu 0001. 6986-6995 [doi]
- Learning to Be a Doctor: Searching for Effective Medical Agent ArchitecturesYangyang Zhuang, Wenjia Jiang, Jiayu Zhang, Ze Yang, Joey Tianyi Zhou, Chi Zhang. 6996-7005 [doi]
- A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener PreferenceChanghao Pan, Wenxiang Guo, Yu Zhang 0126, Zhiyuan Zhu, ZheTao Chen, Han Wang 0019, Zhou Zhao 0001. 7006-7015 [doi]
- ExplorAR: Assisting Older Adults to Learn Smartphone Apps through AR-powered Trial-and-Error with Interactive GuidanceJiawei Li 0009, Linjie Qiu, Zhiqing Wu, Qiongyan Chen, Ziyan Wang, Mingming Fan 0001. 7016-7025 [doi]
- DenseSR: Image Shadow Removal as Dense PredictionYu-Fan Lin, Chia-Ming Lee, Chih-Chung Hsu. 7026-7035 [doi]
- A Comprehensive Model for Visual Fatigue Assessment in 3D Light Field Displays Based on Eye Movement Data AnalysisYu Chen, Binbin Yan, Shuo Chen, Xinzhu Sang. 7036-7044 [doi]
- FeatShield: Isolating Malicious Feature Extractors for Backdoor-Robust Federated LearningZhou Tan, De-li, Yirui Huang, Jia-Li Yin, Ximeng Liu. 7045-7054 [doi]
- Evaluating Visual Quality of Autostereoscopic 3D Displays via a Multimodal Parameter Perception NetworkLiqian Zhang, Feng Yuan, Haoran Xie 0001, Fu Lee Wang, Zhaoqing Pan. 7055-7063 [doi]
- EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion AssessmentLancheng Gao, Ziheng Jia, Yunhao Zeng, Wei Sun 0029, Yiming Zhang, Wei Zhou 0021, Guangtao Zhai, Xiongkuo Min. 7064-7073 [doi]
- Inverse-Tone-Mapped HDR Video Quality Assessment for Broadcast Television: A Comprehensive Dataset and SDR-Referenced MethodLeidong Fan, Qian Zhang, Qing Li. 7074-7083 [doi]
- Bridging the Lab and the Wild: Behavioral Experiments as a Pathway to QoE Research Closer to Realistic EnvironmentDominika Wanat, Dawid Juszka, Mikolaj Leszczuk, Lucjan Janowski. 7084-7092 [doi]
- Like or Not to Like: An Usecase of Vietnamese Street Food Videos on YouTubeDuy X. Nguyen, Hoang V. Hoan, Ninh A. Vu, Loc T. Nguyen, Trung T. Phan. 7093-7102 [doi]
- Walking-with-Portals vs. Teleport in VR: Why Walking and Portals Matter in Small SpacesAna Rita Rebelo, Pedro A. Ferreira, André Tomás Ribeiro, Rui Nóbrega. 7103-7112 [doi]
- MT-DPCQA: A Multimodal Time-aware Learning Approach for No-Reference Dynamic Point Cloud Quality AssessmentSwarna Chakraborty, Mylène C. Q. Farias. 7113-7122 [doi]
- Automatic Accessible Multimodal Translation of Graphics Using A Refreshable Pin ArraySeung-gyeom Kim, Areum Kim, Eunchae Kim, Minho Chung, Yongjae Yoo. 7123-7132 [doi]
- Taming Anomalies with Down-Up Sampling Networks: Group Center Preserving Reconstruction for 3D Anomaly DetectionHanzhe Liang, Jie Zhang, Tao Dai 0001, LinLin Shen, Jinbao Wang, Can Gao. 7133-7141 [doi]
- Uni-Sight: An E2E Vision-Language-Action System Unifying Multi-View Alignment and Multi-Modal FusionDaixun Li, Sibo He, Jiayun Tian, Yusi Zhang, Weiying Xie, Mingxiang Cao, Donglai Liu, Zirui Li, Tianlin Hui, Rui Huang, Yunsong Li 0001. 7142-7151 [doi]
- Degradation-Consistent Learning via Bidirectional Diffusion for Low-Light Image EnhancementJinhong He, Minglong Xue, Zhipu Liu, Mingliang Zhou 0001, Aoxiang Ning, Palaiahnakote Shivakumara. 7152-7161 [doi]
- Improving Compositional Generalization in Cross-Embodiment Learning via Mixture of Disentangled PrototypesRen Wang, Xin Wang 0019, Tongtong Feng, Xinyue Gong, Guangyao Li, Yu-Wei Zhan, Qing Li 0046, Wenwu Zhu 0001. 7162-7171 [doi]
- TF-ATM: Training-Free Adaptive Token MergingXin Zhang 0092, Weiying Xie, Yunsong Li 0001, Xiaoyu Chen, Tianlin Hui, Jitao Ma, Leyuan Fang. 7172-7180 [doi]
- Perspective from a Higher Dimension: Can 3D Geometric Priors Help Visual Floorplan Localization?Bolei Chen, Jiaxu Kang, Haonan Yang 0001, Ping Zhong, Jianxin Wang. 7181-7190 [doi]
- Bright to Dark: Stage-wise Bilevel Knowledge Transfer for Seeing Text in the DarkChengpei Xu, Wenhao Zhou, Long Ma 0002, Weimin Wang 0007, Feng Xia 0001, Binghao Li, Wenjie Zhang 0001. 7191-7199 [doi]
- InteractGuide: LLM-Enhanced Multimodal Reasoning for User-Centric Interaction Recommendations in AR-HRI AuthoringYunqiang Pei, Hongrong Yang, Kaiyue Zhang, Guoqing Wang 0001, Peng Wang 0023, Chaoning Zhang, Yang Yang 0002, Heng Tao Shen. 7200-7209 [doi]
- FractalForensics: Proactive Deepfake Detection and Localization via Fractal WatermarksTianyi Wang 0006, Harry Cheng 0002, Ming-hui Liu, Mohan Kankanhalli. 7210-7219 [doi]
- TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting PriorsMingwei Li, Pu Pang, Hehe Fan, Hua Huang 0001, Yi Yang 0001. 7220-7229 [doi]
- DSDNet: Raw Domain Demoiréing via Dual Color-Space SynergyQirui Yang, Fangpu Zhang, Yeying Jin, Qihua Cheng, Peng-Tao Jiang, Huanjing Yue, Jingyu Yang. 7230-7238 [doi]
- Ex Pede Herculem, Predicting Global Actionness Curve from Local ClipsXu Chen 0053, Yang Li, Yahong Han, Jialie Shen 0001. 7239-7247 [doi]
- BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video CompressionWei Jiang 0031, Junru Li, Kai Zhang, Li Zhang 0136. 7248-7257 [doi]
- Test-Time Model Adaptation for Quantized Neural NetworksZeshuai Deng, Guohao Chen, Shuaicheng Niu, Hui Luo 0002, Shuhai Zhang, Yifan Yang, Renjie Chen, Wei Luo, Mingkui Tan. 7258-7267 [doi]
- EDPC: Accelerating Lossless Compression via Lightweight Probability Models and Decoupled Parallel DataflowZeyi Lu, Xiaoxiao Ma, Yujun Huang, Minxiao Chen, Bin Chen 0011, Baoyi An, Shu-Tao Xia. 7268-7276 [doi]
- ALDEN: Dual-Level Disentanglement with Meta-learning for Generalizable Audio Deepfake DetectionYuxiong Xu, Bin Li 0011, Weixiang Li, Sara Mandelli, Viola Negroni, Sheng Li. 7277-7286 [doi]
- FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian SplattingBoyuan Tian, Qizhe Gao, Siran Xianyu, Xiaotong Cui, Minjia Zhang. 7287-7296 [doi]
- Learning Adaptive Node Selection with External Attention for Human Interaction RecognitionChen Pang, Xuequan Lu, Qianyu Zhou 0001, Lei Lyu 0001. 7297-7306 [doi]
- To Remember, To Adapt, To Preempt: A Stable Continual Test-Time Adaptation Framework for Remote Physiological Measurement in Dynamic Domain ShiftsShuyang Chu, Jingang Shi, Xu Cheng 0003, Haoyu Chen 0001, Xin Liu 0012, Jian Xu, Guoying Zhao 0001. 7307-7316 [doi]
- Robust Modality-Incomplete Anomaly Detection: A Modality-Instructive Framework with BenchmarkBingchen Miao, Wenqiao Zhang, Juncheng Li 0006, Wangyu Wu, Siliang Tang, Zhaocheng Li, Haochen Shi, Jun Xiao 0001, Yueting Zhuang. 7317-7326 [doi]
- The Devil in the Stego Image: Far from Being Usable in Real-World ScenariosHuanqi Wu 0001, Huangbiao Xu, Xiao Ke. 7327-7335 [doi]
- Inter-Task Weaving in Image Enhancement: From a New Unified Architecture to a Better Meta-Representation LearningNan An, Siqi Xu, Long Ma 0002, Zhu Liu 0004, Guangchao Han, Tengyu Ma 0004, Risheng Liu. 7336-7345 [doi]
- Multi-view Graph Clustering with Dual Relation Optimization for Remote Sensing DataRenxiang Guan, Junhong Li, Siwei Wang 0001, Wenxuan Tu, Miaomiao Li 0001, En Zhu, Xinwang Liu 0002, Ping Chen. 7346-7355 [doi]
- E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event CamerasChaoran Feng 0001, Zhenyu Tang 0004, Wangbo Yu, Yatian Pang, Yian Zhao, Jianbin Zhao, Li Yuan 0007, Yonghong Tian 0001. 7356-7365 [doi]
- FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object TrackingSifan Zhou, Jiahao Nie 0001, Ziyu Zhao, Yichao Cao, Xiaobo Lu. 7366-7375 [doi]
- SAMVSR: Leveraging Semantic Priors to Zone-Focused Mamba for Video Snow RemovalHongtao Wu, Yifeng Wu, Jiaxuan Jiang 0001, Chengyu Wu, Hong Wang 0021, Yefeng Zheng 0001. 7376-7385 [doi]
- MS-Road: Towards Spatiotemporal-Consistent Large-Scale Road ReconstructionZe Huang, Zhongyang Xiao, Mingliang Song, Yu Fang, Hongyuan Yuan, Kevin Li Sun, Li Zhang. 7386-7394 [doi]
- Imagining Vision From Language for Few-Shot Class-Incremental LearningShuo Li 0010, Xingchen Liu, Fang Liu 0001, Licheng Jiao, Jiahao Wang 0002, Xinyan Huang, Yanbiao Ma, Puhua Chen, Lingling Li 0002, Xu Liu 0006, Xuejian Gou. 7395-7404 [doi]
- Relightable and Dynamic Gaussian Avatar Reconstruction from Monocular VideoSeonghwa Choi, Moonkyeong Choi, Mingyu Jang, Jaekyung Kim, Jianfei Cai 0001, Wen-Huang Cheng, Sanghoon Lee 0001. 7405-7414 [doi]
- CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset SeparationBinyan Xu, Fan Yang, Xilin Dai, Di Tang 0001, Kehuan Zhang. 7415-7423 [doi]
- UMSD: High Realism Motion Style Transfer via Unified Mamba-based DiffusionZiyun Qian, Zeyu Xiao 0001, Xingliang Jin, Dingkang Yang, Mingcheng Li, Zhenyi Wu, Dongliang Kou, Peng Zhai, Lihua Zhang 0002. 7424-7433 [doi]
- Farther Than Mirror: Explore Pattern-Compensated Depth of Mirror with Temporal Changes for Video Mirror DetectionZhaohu Xing, Lihao Liu, Tian Ye 0001, Sixiang Chen, Yijun Yang, Guang Liu 0006, Lei Zhu 0003. 7434-7443 [doi]
- OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language MappingDanyang Li, Zenghui Yang, Guangpeng Qi, Songtao Pang, Guangyong Shang, Qiang Ma 0007, Zheng Yang 0002. 7444-7452 [doi]
- Adaptive Prompt Learning for Blind Image Quality Assessment with Multi-modal Mixed-datasets TrainingYan Zhong 0001, Xinping Zhao, Li Zhang 0104, Xinyuan Song 0002, Tingting Jiang 0001. 7453-7462 [doi]
- DynMark: A Robust Watermarking Solution for Dynamic Screen Content with Small-size Screenshot SupportChangyu Rao, Gaozhi Liu, Sheng Li 0006, Xinpeng Zhang 0001, Zhenxing Qian. 7463-7471 [doi]
- Wild3A: Novel View Synthesis from Any Dynamic Images in SecondsMingrui Li, Shuhao Zhai, Zibing Zhao, Luyue Sun, Xinxiao Wang, Dong Li, Shuhong Liu, Hongyu Wang. 7472-7480 [doi]
- DRMix: Decomposition-Recomposition Data Augmentation with Diffusion ModelShuo Wang, Zhichuan Wang, Yanmin Chen, Mengyao Zhou, Jun Luo. 7481-7489 [doi]
- Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse EventsLin Zhu 0012, Ruonan Liu, Xiao Wang 0014, Lizhi Wang 0001, Hua Huang 0001. 7490-7499 [doi]
- Contrastive Regularization over LoRA for Multimodal Biomedical Image Incremental LearningHaojie Zhang, Yixiong Liang, Hulin Kuang, LiHui Cen, Zhe Qu, Yigang Cen, Min Zeng 0004, Shichao Kan. 7500-7509 [doi]
- CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio GenerationYuanhong Chen, Kazuki Shimada, Christian Simon, Yukara Ikemiya, Takashi Shibuya 0001, Yuki Mitsufuji. 7510-7518 [doi]
- Polarimetric Monocular Gaussian Splatting SLAM for Dense Surface ReconstructionHaitao Wang, Sijia Wen, Bo Guo. 7519-7528 [doi]
- OpenMoCap: Rethinking Optical Motion Capture under Real-world OcclusionChen Qian 0001, Danyang Li, Xinran Yu, Zheng Yang 0002, Qiang Ma 0007. 7529-7537 [doi]
- MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled EmbeddingChang Liu 0021, Ye Pan, Chenyang Ding, Susanto Rahardja, Xiaokang Yang 0001. 7538-7547 [doi]
- Addressing Granularity-induced Semantic Drift in OvOD via Graph-guided semantically consistent representationHongyan Xu 0002, Zhongze Wu, Ang He, Xi Lin 0003, Yi Chen, Xiu Su. 7548-7557 [doi]
- Lava: Language Driven Scalable and Versatile Traffic Video AnalyticsYanrui Yu, Tianfei Zhou, Jiaxin Sun, Lianpeng Qiao, Lizhong Ding 0003, Ye Yuan 0001, Guoren Wang. 7558-7567 [doi]
- Meta-Knowledge Path Augmentation for Multi-Hop Reasoning on Satellite Commonsense Multi-Modal Knowledge GraphsQian Li 0033, Siyuan Liang, Yuzheng Zhang, Cheng Ji 0001, Zongyu Chang, Shangguang Wang. 7568-7577 [doi]
- Re-Activating Frozen Primitives for 3D Gaussian SplattingYuxin Cheng, Binxiao Huang, Wenyong Zhou, Taiqiang Wu, Zhengwu Liu, Graziano Chesi, Ngai Wong 0001. 7578-7586 [doi]
- From Guesswork to Guarantee: Towards Faithful Multimedia Web Forecasting with TimeSieveSongning Lai, Ninghui Feng, Jiechao Gao, Hao Wang, Haochen Sui, Xin Zou 0001, Jiayu Yang, Wenshuo Chen, Lijie Hu, Hang Zhao 0010, Xuming Hu, Yutao Yue. 7587-7595 [doi]
- Towards Generalized Physical Occlusion Detection On DocumentsYiang Zhu, Haoyue Wang, Zhenxing Qian, Sheng Li 0006, Xinpeng Zhang 0001, Jian Liu. 7596-7605 [doi]
- EHPE: A Segmented Architecture for Enhanced Hand Pose EstimationBolun Zheng, Xinjie Liu, Qianyu Zhang 0002, Canjin Wang, Fangni Chen, Mingen Xu. 7606-7615 [doi]
- Video Instance Segmentation by Weighted Structure InferenceZheyun Qin, Deng Yu, Yang Shi, Qiangchang Wang, Zhumin Chen. 7616-7624 [doi]
- FAB-Attack: Fabric-Aware Adversarial Attacks on Person Detectors under Motion BlurJiaqi Hou, Kewei Zhang, Tianyu Yang, Chengyu Jia, Qiqi Lin, Hui Wei 0004, Zheng Wang 0007. 7625-7634 [doi]
- DualEnhance: External Multimodal Foundation Models Guidance and Internal Fast-Slow Teacher RegulationQi He, Xiao Wu 0001, Jun-Yan He, Wei Li 0110, Zhaoquan Yuan. 7635-7643 [doi]
- IPCMoE: Integrating Perceptual Cues with Mixture-of-Experts for Joint Low-Light Image Enhancement and DeblurringYuezhou Li, Yuzhen Niu, Huangbiao Xu, Hui Da, Rui Xu 0028, Wenxi Liu. 7644-7652 [doi]
- PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive LearningYibo Lyu, Rui Shao 0001, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie. 7653-7662 [doi]
- Cam-Bench: A Benchmark for Image-based Camera Parameter EstimationQuanhong Peng, Dan Zhang 0016, Dong Zhao, Jianpeng Zhang, Meihua Song, Chenlei Lv. 7663-7671 [doi]
- Learning Discrepant Transformations for Face Privacy ProtectionChenda Wei, Haoyue Wang, Zhenxing Qian, Sheng Li 0006, Xinpeng Zhang 0001, Jian Liu. 7672-7680 [doi]
- SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene CompletionZhiwen Yang, Yuxin Peng 0001. 7681-7690 [doi]
- PhonoFence: A Cross-Task Defense Framework for DeepFake via Phoneme-Level Adversarial PerturbationsZhaolin Wei, Xiuwen Shi, Dengpan Ye, Yuhan Lin, Zhigang Wang, Jiacheng Deng 0003, Ziyi Liu 0009, Long Tang. 7691-7699 [doi]
- Motion-Aware Adaptive Pixel Pruning for Efficient Local Motion DeblurringWei Shang 0001, Dongwei Ren, Wanying Zhang, Pengfei Zhu 0001, Qinghua Hu, Wangmeng Zuo. 7700-7708 [doi]
- Uni-Layout: Integrating Human Feedback in Unified Layout Generation and EvaluationShuo Lu, Yanyin Chen, Wei Feng, Jiahao Fan, Fengheng Li, Zheng Zhang, Jingjing Lv, Junjie Shen 0008, Ching Law, Jian Liang. 7709-7718 [doi]
- Seeing from Magic Mirror: Contrastive Learning from Reconstruction for Pose-based Gait RecognitionShibei Meng, Saihui Hou, Yang Fu, Xuecai Hu, JunZhou Huang, Yongzhen Huang. 7719-7728 [doi]
- Mitigating Long-tail Distribution in Oracle Bone Inscriptions: Dataset, Model, and BenchmarkJinhao Li 0001, Zijian Chen 0001, Runze Jiang, Tingzhu Chen, Changbo Wang, Guangtao Zhai. 7729-7738 [doi]
- Scalable Multi-view Clustering based on Tight Anchor DistributionYawei Chen, Huibing Wang, Mingze Yao, Jinjia Peng, Guangqi Jiang, Jiqing Zhang. 7739-7747 [doi]
- Dual-Constraint Multi-view Fuzzy Clustering with Scalable Anchor Graph LearningLuyan Cui, Huibing Wang, Yawei Chen, Mingze Yao, XianPing Fu, Jiqing Zhang. 7748-7756 [doi]
- UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt GuidanceShuning Sun, Yu Zhang, Chen Wu 0006, Dianjie Lu, Guijuan Zhang, Yang Wen, Zhuoran Zheng. 7757-7765 [doi]
- Enhanced Dual-Pixel Image Reflection Removal via Gaussian SplattingKailong Yu, Liyuan Pan, Liu Liu 0009, Wei Liang 0008. 7766-7775 [doi]
- Dynamic Beauty is Easy to Find: A Large-Scale Composition-Aware Dataset and an End-to-End Framework for Video ReframingSitian Gu, Zhiyu Pan, Chaoyi Hong, Chengxin Liu, Zhiguo Cao 0001. 7776-7784 [doi]
- UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion SpaceYong Liu 0031, Jinshan Pan, Yinchuan Li, Qingji Dong, Chao Zhu 0007, Yu Guo 0006, Fei Wang 0008. 7785-7794 [doi]
- CaDGS: Modeling Inter-Gaussian Mutual Information for Dynamic Novel View SynthesisYunlong Zhao 0003, Xiaoheng Deng, Zhuohua Qiu, Feng Yang, Chang Xu 0002, Xiangjian He, Shan You, Xiu Su. 7795-7804 [doi]
- AtlantisGS: Underwater Sparse-View Scene Reconstruction via Gaussian SplattingJingjun Yi, Qi Bi, Hao Zheng 0008, Huimin Huang 0002, Haolan Zhan, Yixian Shen, Wei Ji 0011, Yawen Huang, Yuexiang Li, Xian Wu 0001, Yefeng Zheng 0001. 7805-7814 [doi]
- MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image SegmentationRuicheng Zhang, Yu Sun, Zeyu Zhang, Jinai Li, Xiaofan Liu, Hoi Fan Au, Haowei Guo, Puxin Yan. 7815-7824 [doi]
- Unified Medical Image Segmentation with State Space Modeling SnakeRuicheng Zhang, Haowei Guo, Kanghui Tian, Jun Zhou, Mingliang Yan, Zeyu Zhang, Shen Zhao. 7825-7834 [doi]
- Latent Interactiveness Field for Non-Contact Human Object Interaction DetectionXiang Huang 0004, Ao Luo, Xiao Wu 0001, Zhaoquan Yuan. 7835-7843 [doi]
- HybridPlane: A General 4D Representation for Dynamic Scene ReconstructionRu Jia, Xiaoqian Liang, Xubin Duan, Jianji Wang 0001, Nanning Zheng 0001. 7844-7853 [doi]
- CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image GenerationRuoxuan Zhang, Bin Wen, Hongxia Xie, Yi Yao, Songhan Zuo, Jian-Yu Jiang-Lin, Hong-Han Shuai, Wen-Huang Cheng. 7854-7863 [doi]
- Cross-Model Watermarking via Discriminative Samples for Secure AuthenticationJuan Zhao, Yudao Sun, Zhihai Yang, Cai Xu, Hongji Chen 0005, Fan Zhang, Jianxin Li 0001. 7864-7873 [doi]
- Learning Invariant Discriminative Patterns for Unified Anomaly DetectionChengcheng Xing, Yanyu Xu, Yonghui Xu, LiZhen Cui. 7874-7882 [doi]
- Cross-Domain Attribute Alignment with CLIP: A Rehearsal-Free Approach for Class-Incremental Unsupervised Domain AdaptationKerun Mi, Guoliang Kang, Guangyu Li, Lin Zhao 0003, Tao Zhou 0002, Chen Gong 0002. 7883-7892 [doi]
- PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional StylesTianshun Han, Benjia Zhou, Ajian Liu 0001, Yanyan Liang 0001, Du Zhang, Zhen Lei 0001, Jun Wan 0001. 7893-7901 [doi]
- DAPT: Domain-Aware Prompt-Tuning for Multimodal Fake News DetectionYu Tong, Weihai Lu, Xiaoxi Cui, Yifan Mao, Zhejun Zhao. 7902-7911 [doi]
- ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent MotionXuanchen Wang, Heng Wang 0007, Weidong Cai 0001. 7912-7921 [doi]
- GMML: Gradient-Modulated Robustness for Imbalance-Aware Multimodal LearningZikai Zhang 0004, Xu Zhang, Ziyi Li, Yidong Li, Yuanzhouhan Cao. 7922-7930 [doi]
- EIR-SDG: Explore Invariant Representation for Single-source Domain Generalization in Medical Image SegmentationZiwei Niu, Shiao Xie, Ziyue Wang, Yen-Wei Chen 0001, Yueming Jin, Lanfen Lin. 7931-7939 [doi]
- A Multi-illumination Dataset and an Illumination Domain Adaptation Network for Finger Vein IdentificationHuabin Wang, Yingfan Cheng, Wu Zheng, Jiayuan Cheng, Xin Li, Min Li 0033, Fei Liu. 7940-7948 [doi]
- Towards Blind Bitstream-corrupted Video Recovery: A Visual Foundation Model-driven FrameworkTianyi Liu, Kejun Wu, Chen Cai, Yi Wang 0068, Kim-Hui Yap, Lap-Pui Chau. 7949-7958 [doi]
- Activation and Weight Distribution Balancing for Optimal Post-Training Quantization in Learned Image CompressionJie Yu, Songping Mai, Peng Zhang, Yucheng Jiang, Jian Cheng. 7959-7967 [doi]
- BEAM: Bridging Physically-based Rendering and Gaussian Modeling for Relightable Volumetric VideoYu Hong, Yize Wu, Zhehao Shen, Chengcheng Guo, Yuheng Jiang, Yingliang Zhang, Qiang Hu 0003, Jingyi Yu 0002, Lan Xu 0003. 7968-7977 [doi]
- Bridging Domains in Mental Stress Assessment via Retrieval-Augmented ReasoningYi Dai, Yang Ding 0003, Kaisheng Zeng. 7978-7987 [doi]
- Open-Vocabulary 3D Affordance Understanding via Functional Text Enhancement and Multilevel Representation AlignmentLin Wu, Wei Wei, Peizhuo Yu, Jianglin Lan. 7988-7997 [doi]
- A Large-Scale Dataset for Short-Video Topic Peak Prediction and a Large Heterogeneous Graph ModelShangheng Chen, Shengsheng Qian, Quan Fang, Jun Hu 0016, Changsheng Xu. 7998-8007 [doi]
- DAFU-CAD: Depth-assisted Feature Unraveling for Sketch-based Robust CAD ModelingYue Sun, Xinqi Liu, Zhiliang He, Jialu Zhang 0006, Chenming Wu, Guodong Lu, Jituo Li. 8008-8017 [doi]
- Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease ClassificationHang Guo, Qing Zhang, Zixuan Gao, Siyuan Yang, Shulin Peng, Xiang Tao, Ting Yu, Yan Wang 0033, Qingli Li. 8018-8027 [doi]
- DeflareMamba: Hierarchical Vision Mamba for Contextually Consistent Lens Flare RemovalYihang Huang, Yuanfei Huang, Junhui Lin, Hua Huang 0001. 8028-8037 [doi]
- MADPHash: Manipulation-Aware Deep Perceptual Hashing using Feature ConsistencyLizhi Xiong, Peipeng Yu, Yue Wu. 8038-8047 [doi]
- AnomalyControl: Highly-Aligned Anomalous Image Generation with Controlled Diffusion ModelYuanyi Duan, Wei Xu, Qinlong Wu, Guo-Sen Xie, Fang Zhao 0006, Caifeng Shan. 8048-8057 [doi]
- HOPNet: Learning Hand-Object-Person Interaction Network for Hand Contact State DetectionWei Li 0110, Yizhao Wan, Xiao Wu 0001, Jianshuai Wang, Penglin Dai, Zhaoquan Yuan. 8058-8066 [doi]
- Explicit Context Reasoning with Supervision for Visual TrackingFansheng Zeng, Bineng Zhong 0001, Haiying Xia, Yufei Tan, Xiantao Hu, Liangtao Shi, Shuxiang Song 0001. 8067-8076 [doi]
- LooBox: Loose-box-supervised 3D Tumor Segmentation with Self-correcting Bidirectional LearningTianzhong Lan, Zhang Yi 0001, Xiuyuan Xu, Min Zhu 0005. 8077-8086 [doi]
- Online Continual Learning via Dynamic Expandable Recursive ModelFei Ye, Adrian G. Bors. 8087-8096 [doi]
- TolerantECG: A Foundation Model for Imperfect ElectrocardiogramHuynh Dang Nguyen, Trong-Thang Pham, Ngan Le, Van Nguyen. 8097-8105 [doi]
- Text as Any-Modality for Zero-Shot Classification by Consistent Prompt TuningXiangyu Wu, Feng Yu, Yang Yang 0074, Jianfeng Lu. 8106-8115 [doi]
- Direction-Aware Room Impulse Response Estimation for Immersive Audio Rendering in Real EnvironmentsGiovanni Zanin, Ritujoy Biswas, Pietro Morerio, Sylvio Barbon Junior, Alberto Carini, Alessio Del Bue, Vittorio Murino. 8116-8124 [doi]
- HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D AvatarsHaocheng Tang, Ruoke Yan, Xinhui Yin, Qi Zhang 0042, Xinfeng Zhang 0001, Siwei Ma 0001, Wen Gao 0001, Chuanmin Jia. 8125-8134 [doi]
- Method and Applications of Solid-State Lidar Modeling for X-in-the-Loop Testing of Autonomous VehiclesCheng Peng, Zhen Wang. 8135-8143 [doi]
- Dynamic 2D Gaussians: Geometrically Accurate Radiance Fields for Dynamic ObjectsShuai Zhang 0050, Guanjun Wu, Zhoufeng Xie, Xinggang Wang, Bin Feng 0001, Wenyu Liu 0001. 8144-8153 [doi]
- Noise-Robust Cross-modal Learning for Reliable 2D-3D RetrievalAo Yang, Yanglin Feng, Yuan Sun 0016, Dezhong Peng, Guiduo Duan, Yang Qin. 8154-8163 [doi]
- Dynamic Analysis and Adaptive Discriminator for Fake News DetectionXinqi Su, Zitong Yu, Yawen Cui, Ajian Liu 0001, Xun Lin, Yuhao Wang, Haochen Liang, Wenhui Li, Li Shen 0008, Xiaochun Cao. 8164-8173 [doi]
- PRE-MAP: Personalized Reinforced Eye-tracking Multimodal LLM for High-Resolution Multi-Attribute Point PredictionHanbing Wu, Ping Jiang, Anyang Su, Chenxu Zhao, Tianyu Fu 0001, Minghui Wu, Beiping Tan, Huiying Li. 8174-8183 [doi]
- DIME-Net: A Dual-Illumination Adaptive Enhancement Network Based on Retinex and Mixture-of-ExpertsZiang Wang, Xiaoqin Wang, Dingyi Wang, Qiang Li, Shushan Qiao. 8184-8193 [doi]
- ST-SAM: SAM-Driven Self-Training Framework for Semi-Supervised Camouflaged Object DetectionXihang Hu, Fuming Sun, Jiazhe Liu, Feilong Xu, Xiaoli Zhang. 8194-8203 [doi]
- Uni-DocDiff: A Unified Document Restoration Model Based on DiffusionFangmin Zhao, Weichao Zeng, Zhenhang Li, Dongbao Yang, Binbin Li, Xiaojun Bi 0002, Yu Zhou 0015. 8204-8213 [doi]
- SizeGS: Size-aware Compression of 3D Gaussian Splatting via Mixed Integer ProgrammingShuzhao Xie, Jiahang Liu, Weixiang Zhang, Shijia Ge, Sicheng Pan, Chen Tang, Yunpeng Bai, Cong Zhang 0002, Xiaoyi Fan 0001, Zhi Wang 0001. 8214-8223 [doi]
- Entity-Level Alignment with Prompt-Guided Adapter for Remote Sensing Image-Text RetrievalShuoshuo Li, Shuli Cheng, Liejun Wang. 8224-8233 [doi]
- Flowing Crowd to Count Flows: A Self-Supervised Framework for Video Individual CountingFeng-Kai Huang, Bo-Lun Huang, Li-Wu Tsao, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng. 8234-8243 [doi]
- SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingQianqian Sun, Jixiang Luo, Dell Zhang, Xuelong Li 0001. 8244-8252 [doi]
- Versatile Multimodal Controls for Expressive Talking Human AnimationZheng Qin, Ruobing Zheng, Yabing Wang, Tianqi Li, Zixin Zhu, Sanping Zhou, Ming Yang 0007, Le Wang 0003. 8253-8262 [doi]
- EventLip: Enhancing Event-Based Lip Reading via Frequency-Aware Spatiotemporal Hypergraph ModelingXueyi Zhang, Jialu Sun, Chengwei Zhang, Xianghu Yue, Tianfang Xiao, Siqi Cai 0002, Mingrui Lao, Haizhou Li 0001. 8263-8272 [doi]
- Layer Separation: Towards Adjustable Joint Space Width Images SynthesisHaolin Wang 0007, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki 0001, Tamotsu Kamishima. 8273-8282 [doi]
- DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identificationYujie Yang, Shuang Li, Jun Ye, Neng Dong, Fan Li 0006, Huafeng Li 0001. 8283-8292 [doi]
- BAPEN: Towards Versatile Audio Phase RetrievalLingling Dai, Andong Li, Zhe Han, Chengshi Zheng, Xiaodong Li 0002. 8293-8302 [doi]
- PET-GPRA: Rethinking PET with Gradient-Aware Prompting and Router-Free Adapters for Few-shot Class-Incremental LearningYishu Liu, Zhiming Chen, Desen Wang, Xiaoling Luo, Bingzhi Chen, Guangming Lu 0002. 8303-8312 [doi]
- Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech RecognitionGuanjie Huang, Danny H. K. Tsang, Shan Yang 0001, Guangzhi Lei, Li Liu 0036. 8313-8321 [doi]
- FutureGS: Structured Gaussian Fields for Future-Aware Dynamic Scene ModelingMingyang Ding, Zhan Wang, Jiachen Wang, Tingting Han 0003, Xinyuan Hu, Jiajun Ding, Min Tan 0005, Zhenzhong Kuang. 8322-8331 [doi]
- IFS-Light: An Interactive Framework for Single-view Face Relighting with both Facial and Lighting ConsistencyShuyang Wang, Chunxiao Li, Anlong Ming. 8332-8340 [doi]
- 3D Gaussian Splatting Data Compression with Mixture of PriorsLei Liu, Zhenghao Chen, Dong Xu 0001. 8341-8350 [doi]
- BSGS: Bi-Stage 3D Gaussian Splatting for Camera Motion DeblurringAn Zhao, Piaopiao Yu, Zhe Zhu, Mingqiang Wei. 8351-8359 [doi]
- Frequency Domain Distributed Perturbations: Towards Query-Efficient Black-Box Adversarial Video AttackTeng Jin, Ziwen He, Zhangjie Fu, Songping Wang, Yueming Lyu, Yufei Shi. 8360-8368 [doi]
- Synthetic-to-Real Camouflaged Object DetectionZhiHao Luo, Luojun Lin, Zheng Lin 0005. 8369-8378 [doi]
- FSCDiff: Frequency-Spatial Entangled Conditional Diffusion model for Underwater Salient Object DetectionHua Li 0012, Gaowei Lin, Zhiyuan Li, Sam Kwong, Runmin Cong. 8379-8388 [doi]
- Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image RepresentationJing Jin, Xu Liu, Te Gao, Zhihong Shi, Yixiong Liang, Ruiqing Zheng, Hulin Kuang, Min Zeng 0004, Shichao Kan. 8389-8398 [doi]
- DilateQuant: Accurate and Efficient Quantization-Aware Training for Diffusion Models via Weight DilationXuewen Liu, Zhikai Li, Minghao Jiang, Mengjuan Chen, Jianquan Li, Qingyi Gu. 8399-8408 [doi]
- StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step ReasoningYuxi Bi, Yunfan Gao, Haofen Wang. 8409-8417 [doi]
- NIVM: Real-time View Morphing via Neural Implicit FunctionTung-I Chen, Dae Yeol Lee, Guan-Ming Su, Mohammad Hajiesmaili, Ramesh K. Sitaraman. 8418-8427 [doi]
- DSPF: Dual-Stage Preservation and Fusion for Source-Free Domain Adaptive Point Cloud CompletionZhiqian Xia, Haifeng Xia, Shichao Jin, Wei Wang 0335, Zhengming Ding, Xiaochun Cao. 8428-8437 [doi]
- FluidGS: Physics Informed Gaussian Splatting for Dynamic Fluid Reconstruction from Sparse ViewsYouchen Xie, Chen Li 0035, Sheng Qiu, Zhi-Jun Wang, Chenhui Li 0001, Yibo Zhao 0001, Zan Gao, Changbo Wang. 8438-8447 [doi]
- Cross Paradigm Representation and Alignment Transformer for Image DerainingShun Zou, Yi Zou, Juncheng Li 0003, Guangwei Gao, Guo-Jun Qi. 8448-8457 [doi]
- DiffuFuse: Diffusion-Driven Dual-Stream Fusion Framework for Multimodal Sentiment AnalysisXiongjian Lv, Yimin Wen, Hang Yu. 8458-8467 [doi]
- Detecting Forged HEVC Videos via Anomalous Bitrate-Compressed Traces: A Frame-Level Bitrate Analysis FrameworkLizhi Xiong, Linsen Ding, Ziqiang Li. 8468-8477 [doi]
- TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian SplattingZhicong Wu, Hongbin Xu, Gang Xu, Ping Nie, Zhixin Yan, Jinkai Zheng, Liangqiong Qu, Ming Li 0073, Liqiang Nie. 8478-8487 [doi]
- F-DDIM: A Featurized Denoising Diffusion Implicit Model for Facial Image SteganographyLiqi Yan, Xuebin Li, Jianhui Zhang, Fangli Guan, Kanglei Peng, Pan Li. 8488-8496 [doi]
- MoTAS: MoE-Guided Feature Selection from TTS-Augmented Speech for Enhanced Multimodal Alzheimer's Early ScreeningYongqi Shao 0001, Bingxin Mei, Cong Tan, Hong Huo, Tao Fang. 8497-8505 [doi]
- From Outline to Detail: An Hierarchical End-to-end Framework for Coherent and Consistent Visual Novel Generation and AssemblyYilin Zhang, Yanyan Wei, Zhao Zhang 0001, Jicong Fan 0001, Haijun Zhang 0002, Shuicheng Yan. 8506-8516 [doi]
- Graph-Guided Dual-Level Augmentation for 3D Scene SegmentationHongbin Lin, Yifan Jiang, Juangui Xu, Jesse Jiaxi Xu, Yi Lu, Zhengyu Hu, Ying-Cong Chen, Hao Wang 0094. 8517-8526 [doi]
- Positive Style Accumulation: A Style Screening and Continuous Utilization Framework for Federated DG-ReIDXin Xu 0007, Chaoyue Ren, Wei Liu 0183, Wenke Huang 0003, Bin Yang 0026, Zhixi Yu, Kui Jiang. 8527-8536 [doi]
- Video-Level Multimodal Relation Extraction with Event-Entity Semantic ConsistencyZefan Zhang, Weiqi Zhang, Kailong Suo, Yanhui Li, Tian Bai 0002. 8537-8546 [doi]
- Query-Based Audio-Visual Temporal Forgery Localization with Register-Enhanced Representation LearningXiaodong Zhu, Suting Wang, Junqi Yang, Yuhong Yang 0001, Weiping Tu, Zhongyuan Wang 0001. 8547-8556 [doi]
- Enhancing HOI Detection with Contextual Cues from Large Vision-Language ModelsYu-Wei Zhan, Fan Liu 0008, Xin Luo 0006, Xin-Shun Xu, Liqiang Nie, Mohan Kankanhalli. 8557-8566 [doi]
- Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image CollectionsYongtang Bao, Chengjie Tang, Yuze Wang 0006, Haojie Li. 8567-8576 [doi]
- Two-View Correspondence Pruning via Channel-Spatial Interaction and Bidirectional Consensus InteractionXiangui Huang, Taotao Lai, Yizhang Liu, Shuyuan Lin, Zuoyong Li. 8577-8585 [doi]
- Spatial Imputation Drives Cross-Domain Alignment for EEG ClassificationHongjun Liu, Chao Yao, Yalan Zhang, Xiaokun Wang 0001, Xiaojuan Ban. 8586-8595 [doi]
- Saliency-Guided Adaptive Random Diffusion for Remote Sensing Images Restoration with Cloud and HazeWanting Zhang, Jingxuan Zhang, Libao Zhang. 8596-8605 [doi]
- CardioLive: Empowering Video Streaming with Online Cardiac Monitoring via Audio-Visual LearningSheng Lyu, Ruiming Huang, Sijie Ji, Yasar Abbas Ur Rehman, Lan Ma, Chenshu Wu. 8606-8615 [doi]
- Wavelet-GS: 3D Gaussian Splatting with Wavelet DecompositionBeizhen Zhao, Yifan Zhou, Sicheng Yu, Zijian Wang, Hao Wang 0094. 8616-8625 [doi]
- Unicorn: Unified Neural Image Compression with One Number ReconstructionQi Zheng 0004, Haozhi Wang, Zihao Liu, Jiaming Liu, Zhijian Hao, Bu Chen, Min Li 0033, Rui Wan, Peiye Liu, YanHeng Lu, Dimin Niu, Jinjia Zhou, Minge Jing, Yibo Fan. 8626-8635 [doi]
- GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology AnalysisRuoqi Wang, Haitao Wang 0026, Qiong Luo 0001. 8636-8644 [doi]
- Low-light Invariant Representation Learning for Visible-Infrared Person Re-identificationDengwen Wang, Guanyu Xing, Yanli Liu 0002. 8645-8653 [doi]
- Focus on Generalization: Improving Adversarial Transferability via Bi-Level Bias MitigationYiqiang Guo, Lei Zhong, Bin Chen, Jia-Li Yin, Xiaolei Liu 0001, Shouling Ji. 8654-8662 [doi]
- Casual3DHDR: High Dynamic Range 3D Gaussian Splatting from Casually Captured VideosShucheng Gong, Lingzhe Zhao, Wenpu Li, Hong Xie, Yin Zhang, Shiyu Zhao 0002, Peidong Liu 0001. 8663-8672 [doi]
- Pair-wise Confidence Difference-based Pseudo-Label Selection for Universal Mismatched SteganalysisFan Wang 0024, Zhangjie Fu, Xiang Zhang 0023, Ziqiang Li, Ziwen He, Manyu Wang. 8673-8681 [doi]
- End-to-End Multiple Object Tracking with Dynamic Scene PerceptionRuonan Wei, Yuntao Wang, Siyan Fang, Yuehuan Wang. 8682-8691 [doi]
- Can I Trust You? Advancing GUI Task Automation with Action Trust ScoreHaiyang Mei, Difei Gao, Xiaopeng Wei, Xin Yang 0011, Mike Zheng Shou. 8692-8700 [doi]
- Boosting Temporal Sentence Grounding via Causal InferenceKefan Tang, Lihuo He, Jisheng Dang, Xinbo Gao 0001. 8701-8710 [doi]
- EgoHierMask: Hierarchical Semantic-Prior Guided Masked Autoencoder for Egocentric Action RecognitionJiang Shao, Xinbo Zhao, Xiaochun Zou, Xiaolin Ye. 8711-8720 [doi]
- Toward Robust Signed Graph Learning through Joint Input-Target DenoisingJunran Wu, Beng Chin Ooi, Ke Xu 0001. 8721-8729 [doi]
- Robust Gaussian Surface Reconstruction with Semantic Aware Progressive PropagationYusen Wang, Huan Zhou, Yu Jiang 0007, Chunxia Xiao. 8730-8739 [doi]
- Sequence-Event Semantic Consistent Learning for Text-to-Motion RetrievalHaoyu Shi, Huaiwen Zhang. 8740-8749 [doi]
- Dual-Prototype Learning in Multiple Instance Learning for Histopathology Image ClassificationTing Xiao 0002, Minqian Sun, Yiqing Xia, Zhe Wang 0002. 8750-8758 [doi]
- SG-FSL: Cross-Domain Few-Shot Learning with Style-Decoupled Augmentation and Gradient-Conflict AdjustmentYunyu Zou, Yishu Liu, Jun Liang 0002, Bingzhi Chen. 8759-8768 [doi]
- DVW: Diffusion Visible WatermarkJiawei Zhang 0011, Xiaoli Jiang, Hao Wang 0060, Lin Yuan 0002, Xiangyang Luo 0001, Bin Ma 0003, Jinwei Wang. 8769-8777 [doi]
- ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer UseKaixin Li, Ziyang Meng, Hongzhan Lin 0001, Ziyang Luo, Yuchen Tian, Jing Ma 0004, Zhiyong Huang 0010, Tat-Seng Chua. 8778-8786 [doi]
- Skeleton Compression and Complementary Enhanced Fusion Under Branch-Stage Supervision for Human Action RecognitionQin Li 0010, Congcong Xiao, Limei Liu, Han Peng, Junfeng Yang. 8787-8796 [doi]
- MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical InstructionsZeyu Huang, Juyuan Wang, Longfeng Chen, Boyi Xiao, Leng Cai, Yawen Zeng, Jin Xu. 8797-8805 [doi]
- Symmetrical Awareness Generation for Pelvic Image SegmentationYize Song, Yunqing Chen, Zhou Wang, Cheng Chen 0024, Ruoxiu Xiao. 8806-8814 [doi]
- Gamma: Toward Generic Image Assessment with Mixture of Assessment ExpertsHantao Zhou, Rui Yang 0041, Longxiang Tang, Guanyi Qin, Runze Hu, Xiu Li 0001. 8815-8824 [doi]
- LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMsWoo Yi Yang, Jiarui Wang, Sijing Wu, Huiyu Duan, Yuxin Zhu, Liu Yang, Kang Fu, Guangtao Zhai, Xiongkuo Min. 8825-8834 [doi]
- SAM-Guided Semantic Knowledge Fusion for Visible-Infrared Object DetectionTing Li, Songtao Li, Shuaifeng Li, Xiaolin Qin, Maoyuan Zhao, Luping Ji, Mao Ye 0001. 8835-8844 [doi]
- Test-time Graph OOD Detection via Dynamic Dictionary Expansion and OOD Score CalibrationYue Hou, Yingke Su, Junran Wu, Ke Xu 0001. 8845-8853 [doi]
- DUDA: A Two-stage Decoupling Unsupervised Domain Adaptation Framework for Semi-supervised Singing Melody Extraction from Polyphonic MusicShuai Yu 0002, Xiaoliang He, Kangjie Dong, Yi Yu 0001. 8854-8862 [doi]
- Spatial-Aware Multi-Modal Information Fusion for Food Nutrition EstimationDongjian Yu, Weiqing Min, Xin Jin 0005, Qian Jiang, Shuqiang Jiang. 8863-8871 [doi]
- AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio GenerationYan Rong, Jinting Wang, Guangzhi Lei, Shan Yang 0001, Li Liu 0036. 8872-8881 [doi]
- SUVIS: A Depth- and Motion-Encoded Stereoscopic System for Communicating Forecast UncertaintyLe Liu, Shizhou Zhang, Di Xu. 8882-8890 [doi]
- ColorDiffuser: Video Colorization with Pretrained Text-to-Image Diffusion ModelsHanyuan Liu, Minshan Xie, Jinbo Xing, Chengze Li, Chi-Sing Leung, Tien-Tsin Wong. 8891-8900 [doi]
- Breaking the Synthetic Barrier: Towards Stable and Generalizable Real-World Image DehazingZhuo Su 0001, Jufeng Li, Yan Zhang 0002, Xin Li 0175, Fuwei Zhang, Yuxin Feng, Fan Zhou 0001. 8901-8909 [doi]
- SepVAMark: Deep Separable Visual-Audio Fusion Watermarking for Source Tracing and Deepfake DetectionChuan Zhang 0003, Zihan Li, Zihao Xu, Xuhao Ren, Liehuang Zhu. 8910-8919 [doi]
- DATE: Dual Prompt Learning with Information Bottleneck for Graph Out-of-Distribution GeneralizationJiayi Zeng, Tao Ren 0002, Changhu Wang, Yifan Wang 0014, Wei Ju 0001, Zhipeng Sun, Xiao Luo 0001. 8920-8929 [doi]
- Phase Distribution Matters: On the Importance of Phase Distribution Alignment (PDA) in Holographic ApplicationsSeungmi Choi, Taehwa Lee, Jun Yeong Cha, Suhyun Jo, Hyunmin Ban, Kwan-Jung Oh, Hyunsuk Ko, Hui-Yong Kim. 8930-8938 [doi]
- LSFDNet: A Single-Stage Fusion and Detection Network for Ships Using SWIR and LWIRYanyin Guo, Runxuan An, Junwei Li 0009, Zhiyuan Zhang 0004. 8939-8948 [doi]
- CODE: Towards Partial Label Graph Learning via Coupled Dual SeparationYiyang Gu, Taian Guo, Hang Zhou 0008, Zihao Chen, Zhiping Xiao 0001, Yifang Qin, Xiao Luo 0001, Wei Ju 0001, Yifan Wang 0014, Ming Zhang 0004. 8949-8958 [doi]
- Multi-Modal Gradual Domain Osmosis: Stepwise Dynamic Learning with Batch Matching for Gradual Domain AdaptationZixi Wang, Yubo Huang, Jingzehua Xu, Jinzhu Wei, Shuai Zhang, Xin Lai. 8959-8967 [doi]
- CLIP-HNet: Hybrid Network with Cross-Modal Guidance for Self-Supervised Remote Sensing DehazingShan Wang 0009, Weisi Lin, Yun Liu 0002, Libao Zhang. 8968-8977 [doi]
- Reading Between the Channels: Knowledge-Augmented Medical Time Series ClassificationXiaoyan Yuan, Wei Wang 0077, Junxin Chen 0001, Xiping Hu. 8978-8987 [doi]
- Through Someone Else's Eyes: Lifelogging Meets Narrative Virtual RealityLiang Xu, Songkai Jia, Cathal Gurrin, Monica Ward, Allie Tran. 8988-8996 [doi]
- Multi-view Collaborative Representation Learning from Noisy Labels for VHR Imagery ClassificationGuangfei Li, Quanxue Gao, Yu Lei, Yichen Bao, Qianqian Wang 0001. 8997-9005 [doi]
- DSF-Net: Dynamic Sparse Fusion of Event-RGB via Spike-Triggered Attention for High-Speed DetectionDongyang Ma, Zhengyu Ma, Wei Zhang 0161, Yonghong Tian 0001. 9006-9015 [doi]
- Degradation-Aware One-Step Diffusion Model for Content-Sensitive Super-Resolution in the DarkTengyu Ma 0004, Jiafa Ruan, Yuetong Wang, Guangchao Han, Zhu Liu 0004, Long Ma 0002, Risheng Liu. 9016-9025 [doi]
- MoCount: Motion-Based Repetitive Action CountingRuocheng Gu, Sen Jia 0003, Yule Ma, Jinqin Zhong, Jenq-Neng Hwang, Lei Li 0050. 9026-9034 [doi]
- Bridging Inter-Class Ambiguity and Spatial Variability in Flexible Object Recognition via Graph DistillationLin Zuo, Kunshan Yang, Mengmeng Jing, Xiangxu Zhao, Jiaqiao Chen. 9035-9043 [doi]
- Rethinking the Reliability of Evidence in End-to-End Fact-Checking from the Causal PerspectiveXubo Liu 0002, Wenya Guo, Ruxue Yan, Xumeng Liu, Ying Zhang 0015, Ru Zhou. 9044-9052 [doi]
- Closing the Feedback Loop in Text2Vis: Refining Visualization with Vision-Language ModelsShengze Shi, Tao Ren 0001, Guoliang Zhu, Guan Dong Feng, Jun Hu 0015. 9053-9061 [doi]
- Neural Additive Adapters for Interpretable Nutrition PredictionVitalii Emelianov 0001, Niki Martinel. 9062-9070 [doi]
- TV-RAG: A Temporal-aware and Semantic Entropy-Weighted Framework for Long Video Retrieval and UnderstandingZongsheng Cao, Yangfan He, Anran Liu, Jun Xie 0003, Feng Chen 0044, Zhepeng Wang 0002. 9071-9079 [doi]
- FastRSR: Efficient and Accurate Road Surface Reconstruction in Bird's Eye ViewYuting Zhao, Yuheng Ji, Xiaoshuai Hao, Shuxiao Li. 9080-9089 [doi]
- RIFTCast: A Template-Free End-to-End Multi-View Live Telepresence Framework and BenchmarkDomenic Zingsheim, Markus Plack, Hannah Dröge, Janelle Pfeifer, Patrick Stotko, Matthias B. Hullin, Reinhard Klein. 9090-9099 [doi]
- RSFomer: Time Series Transformer for Robust Sports Action RecognitionYongan Guo, Zhongyan Zhou, Yuao Wang, Na Zhu, Xuyun Zhang, Hongwang Xiao, Yuan Miao 0001, Bo Li 0103. 9100-9109 [doi]
- ResearchPulse: Building Method-Experiment Chains through Multi-Document Scientific InferenceQi Chen, Jingxuan Wei, Zhuoya Yao, Haiguang Wang, Gaowei Wu, Bihui Yu, Siyuan Li 0002, Cheng Tan 0012. 9110-9119 [doi]
- Hierarchical Disentanglement of Cognitive States for Enhanced Cognitive DiagnosisHengnian Gu, Zhifu Chen, Jin Peng Zhou, Dongdai Zhou. 9120-9129 [doi]
- DiffuQKT: A Diffusion-Based Approach for Improved Question Representation in Knowledge TracingFenghua Yu, Jianwen Sun, Qian Wan 0007, Meicheng Chen, Xiaoxuan Shen, Qing Li 0045. 9130-9139 [doi]
- ELFATT: Efficient Linear Fast Attention for Vision TransformersChong Wu 0007, Maolin Che, Renjie Xu, Zhuoheng Ran, Hong Yan 0001. 9140-9149 [doi]
- JPEG-RAE: Reversible Adversarial Example for Privacy and Copyright Protection of JPEG ImagesDahao Fu, Jiangqun Ni, Jian Zhang 0086. 9150-9158 [doi]
- Ingredients-Guided and Nutrients-Prompted Network for Food Nutrition EstimationDonglin Zhang 0001, Boyuan Ma, Xiaojun Wu 0001, Josef Kittler. 9159-9167 [doi]
- VisAug: Facilitating Speech-Rich Web Video Navigation and Engagement with Auto-Generated Visual AugmentationsBaoquan Zhao, Xiaofan Ma, Qianshi Pang, Ruomei Wang 0001, Fan Zhou 0001, Shujin Lin. 9168-9176 [doi]
- Dialogue-Driven Interactive Dynamic Learning for Text-to-Image Person RetrievalHongyu Liu, Hongwei Ge, Yuxuan Liu 0015, Yaqing Hou. 9177-9185 [doi]
- Generative Multi-Sensory Meditation: Exploring Immersive Depth and Activation in Virtual RealityYuyang Jiang, Binzhu Xie, Lina Xu, Xiaokang Lei, Shi Qiu, Luwen Yu, Pan Hui 0001. 9186-9195 [doi]
- Pathology-Aware Prototype Evolution via LLM-Driven Semantic Disambiguation for Multicenter Diabetic Retinopathy DiagnosisChunzheng Zhu, Yangfang Lin, Jialin Shao, Jianxin Lin, Yijun Wang. 9196-9205 [doi]
- VRMusicStage: A System for Converting Fixed-Camera Music Stage Videos into Immersive VR ContentSeungkyu Leem, SeokHyun Jeong, Yeonho Cho, Yoonjae Lee, Jungjin Lee. 9206-9215 [doi]
- Understand, Refine and Summarize: Multi-View Knowledge Progressive Enhancement Learning for Fake News Video DetectionZhi Zeng, Jiaying Wu, Minnan Luo, Xiangzheng Kong, Zihan Ma 0010, Guang Dai, Qinghua Zheng. 9216-9225 [doi]
- Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion ModelsHuijie Liu, Jingyun Wang 0001, Shuai Ma, Jie Hu 0019, Xiaoming Wei, Guoliang Kang. 9227-9236 [doi]
- Multi-Object Sketch Animation with Grouping and Motion Trajectory PriorsGuotao Liang, Juncheng Hu 0001, XiMing Xing, Jing Zhang, Qian Yu. 9237-9246 [doi]
- Position-LoRA: Enhanced Relation Customization through Structural Prior in Initial Latent NoiseYiming Li, Peng Zhou 0010, Xiaokang Qin, Hongwei Hu, Jun Sun 0005, Yi Xu 0001. 9247-9256 [doi]
- Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted ConceptsLeyang Li, Shilin Lu, Yan Ren 0002, Adams Wai-Kin Kong. 9257-9266 [doi]
- TiP4GEN: Text to Immersive Panorama 4D Scene GenerationKe Xing, Hanwen Liang, Dejia Xu, Yuyang Yin, Konstantinos N. Plataniotis, Yao Zhao 0001, Yunchao Wei. 9267-9276 [doi]
- Focus Where It Matters: LLM-Guided Regional Identification for Instruction-based Image EditingMinho Park 0002, Youngjoo Jo, Jae-Hyeok Lee 0001, Jiyong Lee, Dong-Oh Kang, Yong Man Ro. 9277-9286 [doi]
- Chain-of-Cooking: Cooking Process Visualization via Bidirectional Chain-of-Thought GuidanceMengling Xu, Ming Tao 0002, Bing-Kun Bao. 9287-9295 [doi]
- 2Gaussian: Dynamic Control with Discretized 3D View Modeling for Text-Driven 3D Gaussian Splatting EditingYefei Sheng, Jie Wang 0061, Ming Tao 0002, Bing-Kun Bao. 9296-9305 [doi]
- Unknown Pixel Mask Based Fine-tuning of 2D Inpainting Models for Unbounded 3D Scene Generation from a Single ImageDezhi Zheng, Kaijun Deng, Xianxu Hou, Jinbao Wang, Xiaoqin Wang, LinLin Shen. 9306-9315 [doi]
- Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech SynthesisYifan Yang 0005, Shujie Liu 0001, Jinyu Li 0001, Yuxuan Hu 0003, Haibin Wu, Hui Wang 0075, Jianwei Yu 0001, Lingwei Meng, Haiyang Sun 0004, Yanqing Liu, Yan Lu 0001, Kai Yu 0004, Xie Chen 0001. 9316-9325 [doi]
- Towards Robust and Controllable Text-to-Motion via Masked Autoregressive DiffusionZongye Zhang 0002, Bohan Kong, Qingjie Liu 0001, Yunhong Wang 0001. 9326-9335 [doi]
- Latent Space Consistency for Sparse-View CT ReconstructionDuoyou Chen, Yunqing Chen, Can Zhang, Zhou Wang, Cheng Chen 0024, Ruoxiu Xiao. 9336-9344 [doi]
- 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion ModelsMin Wei, Chaohui Yu, Jingkai Zhou, Fan Wang. 9345-9354 [doi]
- 33D): Collaborative Representation for Editable and Skeleton-Drivable 3D Asset GenerationYuxuan Xiong, Ye Chen 0006, Yue Shi, Zhangli Hu, Bingbing Ni. 9355-9364 [doi]
- OmniGen: Unified Multimodal Sensor Generation for Autonomous DrivingTao Tang, Enhui Ma, Xia Zhou, Letian Wang, Tianyi Yan, Xueyang Zhang, Kun Zhan, Peng Jia, Xianpeng Lang, Jia-Wang Bian, Kaicheng Yu, Xiaodan Liang. 9365-9374 [doi]
- Look Beyond: Two-Stage Scene View Generation via Panorama and Video DiffusionXueyang Kang, Zhengkang Xiang, Zezheng Zhang, Kourosh Khoshelham. 9375-9384 [doi]
- Synthesizing 3D Scenes via Diffusion Model that Incorporates Indoor Scene CharacteristicsLiang Yue, Shao-Kui Zhang, Lin Yuan, Yi-Tao Chen, Zirui Zhou, Song-Hai Zhang. 9385-9394 [doi]
- OnlineHOI: Towards Online Human-Object Interaction Generation and PerceptionYihong Ji, Yunze Liu, Yiyao Zhuo, Weijiang Yu, Fei Ma 0006, Joshua Zhexue Huang, Fei Yu 0016. 9395-9403 [doi]
- Human Motion Generation in 3D Scenes from Open-Ended Textual Instructions with MLLM PlanningSiyi Qian, Jian Fang, Yuzhou Mao, Yayun Zou, Wentao Zhang 0001, Haiwei Xue. 9404-9413 [doi]
- AICL: Action In-Context Learning for Text-to-Video GenerationJianzhi Liu, Junchen Zhu, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Jingkuan Song. 9414-9423 [doi]
- PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum LearningYingjie Xi, Jian-Jun Zhang, Xiaosong Yang. 9424-9433 [doi]
- Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video GenerationZhenghao Zhang, Junchao Liao, Xiangyu Meng, Long Qin, Weizhi Wang. 9434-9443 [doi]
- SynergyAmodal: Deocclude Anything with Text ControlXinyang Li, Chengjie Yi, Jiawei Lai, Mingbao Lin, Yansong Qu, Shengchuan Zhang, Liujuan Cao. 9444-9453 [doi]
- DiffusionMat: Alpha Matting as Deterministic Sequential Refinement LearningYangyang Xu, Shengfeng He, Wenqi Shao, Yong Du 0003, Kwan-Yee K. Wong, Yu Qiao 0001, Jun Yu 0001, Ping Luo 0002. 9454-9462 [doi]
- Single Trajectory Distillation for Accelerating Image and Video Style TransferSijie Xu, Runqi Wang, Wei Zhu, Dejia Song, Nemo Chen, Xu Tang 0007, Yao Hu 0002. 9463-9471 [doi]
- SSAIM: Not All Self-Attentions Contain Effective Spatial Structure in Diffusion Models for Text-to-Image EditingZhenbo Yu, Jimin Dai, Yingzhen Zhang, Jian Yang 0003, Lei Luo 0001. 9472-9480 [doi]
- LoCo: Training-Free Layout-to-Image Synthesis with Localized ConstraintsPeiang Zhao, Han Li, Ruiyang Jin, S. Kevin Zhou. 9481-9490 [doi]
- InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable ObjectsXinhao Cai, Minghang Zheng, Xin Jin, Yang Liu 0105. 9491-9499 [doi]
- Interact-Custom: Customized Human Object Interaction Image GenerationZhu Xu, Zhaowen Wang, Yuxin Peng 0001, Yang Liu 0105. 9500-9508 [doi]
- EditMaster: Bridging Text instruction and Visual Example for Multimodal guided Image EditingJiahui Zhang, Mengtian Li, Jiewei Tang, Junyu Deng, Siyu Tian, Xiang Liu, Meng Zhang, Guangnan Ye, Yu-Gang Jiang 0001. 9509-9518 [doi]
- AnyStyleDiffusion: Flexible Style Transfer with Consistent Content Adaptation Across Diffusion ModelsZhenyu Xu, Junjie Wu, Zhiyan Piao, Xiaoqi Sheng, Yu Xiao, Xinyu Zhang 0002. 9519-9528 [doi]
- LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video CreationWenhui Song, Hanhui Li, Jiehui Huang, Panwen Hu, Yuhao Cheng, Long Chen, Yiqiang Yan, Xiaodan Liang. 9529-9538 [doi]
- Granular Music Attribute Transformation with Proximal Policy Optimization Adapters for Diffusion ModelKunsheng Ma, Fan Qi, Changsheng Xu. 9539-9548 [doi]
- Towards Perfection: Building Inter-component Mutual Correction for Retinex-based Low-light Image EnhancementLuyang Cao, Han Xu 0001, Jian Zhang 0090, Lei Qi 0001, Jiayi Ma 0001, Yinghuan Shi, Yang Gao 0001. 9549-9558 [doi]
- DCNOT: Diffusion-Cascaded Neural Optimal Transport for Scalable Multi-Domain Image-to-Image TranslationYingzhen Zhang, Jimin Dai, Qianliang Wu, Jian Yang 0003, Lei Luo 0001. 9559-9568 [doi]
- SVDGNet: Shapley Value-Based Weight Adjustment for Unsupervised Image Style TransferYi Han, Yaochen Li, Peijun Chen, Wenlong Zhou, Jinhuo Yang, Jintao Chang. 9569-9577 [doi]
- GOES: 3D Gaussian-based One-shot Head Animation with Any Emotion and Any StyleChuhang Ma, Shuai Tan 0002, Junjie Wei, Ye Pan. 9578-9587 [doi]
- Accelerating Diffusion Transformer via Error-Optimized CacheJunxiang Qiu, Shuo Wang 0008, Jinda Lu, Lin Liu, Houcheng Jiang, Xingyu Zhu, Yanbin Hao. 9588-9597 [doi]
- DiffArtist: Towards Structure and Appearance Controllable Image StylizationRuixiang Jiang, Chang Wen Chen. 9598-9607 [doi]
- SVGen: Interpretable Vector Graphics Generation with Large Language ModelsFeiyu Wang, Zhiyuan Zhao 0005, Yuandong Liu, Da Zhang 0010, Junyu Gao 0001, Hao Sun 0038, Xuelong Li 0001. 9608-9617 [doi]
- ISDrama: Immersive Spatial Drama Generation through Multimodal PromptingYu Zhang 0126, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Tao Jin 0004, Zhou Zhao 0001. 9618-9627 [doi]
- Hi-Motion: Hierarchical Intention Guided Conditional Motion SynthesisLe Han, Kaixuan Chen 0004, Minchen Ye, Nenggan Zheng. 9628-9637 [doi]
- Behave Your Motion: Habit-preserved Cross-category Animal Motion TransferZhimin Zhang 0008, Bi'an Du, Caoyuan Ma, Zheng Wang 0007, Wei Hu 0003. 9638-9647 [doi]
- Retrieval Augmented 3D Garment Generation from Single ImageQixun Zeng. 9648-9656 [doi]
- Fine-tuning Bias Neurons for Fair Text-to-Image GenerationFan Qi, Zhan Wang, Changsheng Xu, Huaiwen Zhang. 9657-9666 [doi]
- DRC: Enhancing Personalized Image Generation via Disentangled Representation CompositionYiyan Xu, Wuqiang Zheng, Wenjie Wang 0007, Fengbin Zhu, Xinting Hu, Yang Zhang 0072, Fuli Feng, Tat-Seng Chua. 9667-9676 [doi]
- Reactffusion: Physical Contact-guided Diffusion Model for Reaction GenerationZihang Zhang, Shoulong Zhang, Yan Wang, Shuai Li 0001. 9677-9685 [doi]
- Uncertainty-Guided Face Matting for Occlusion-Aware Face TransformationHyebin Cho, Jaehyup Lee. 9686-9694 [doi]
- Learning Evidential Delta Denoising Scores for Video EditingYufan Hu, Kunlin Yang, Junyu Gao 0002, Bin Fan 0001, Hongmin Liu 0001. 9695-9703 [doi]
- Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head SynthesisTianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang 0007. 9704-9713 [doi]
- Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion ModelsYiming Wu 0005, Zhenghao Chen, Huan Wang 0014, Dong Xu 0001. 9714-9723 [doi]
- CP3: Customizable 3D Pop-Out Effect Creation for Immersive Content Using Multimodal ModelsZezhou Chen, Ping Chen, Huan Hu, Xiang Liu, Zipeng Wang, Zhaoxiang Liu, Kai Wang 0012, Shiguo Lian. 9724-9732 [doi]
- REA-Listener: Real-Time Listening Head Generation with Dynamic Emotion Modeling and Flexible Modality AdaptationSizhe Zhao, Chenyang Wang, Weiyu Zhao, Zonglin Li, Ming Li, Shengping Zhang. 9733-9742 [doi]
- Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion SynthesisZihao Liu, Mingwen Ou, Zunnan Xu, Jiaqi Huang, Haonan Han, Ronghui Li, Xiu Li 0001. 9743-9752 [doi]
- DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion AlignmentXiaofan Li, Chenming Wu, Zhao Yang, Zhihao Xu, Yumeng Zhang, Dingkang Liang, Ji Wan, Jun Wang. 9753-9762 [doi]
- HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene GenerationHaiyang Zhou, Wangbo Yu, Jiawen Guan, Xinhua Cheng, Yonghong Tian 0001, Li Yuan 0007. 9763-9772 [doi]
- CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image GenerationHyunwoo Oh, SeungJu Cha 0001, Kwanyoung Lee, Si-Woo Kim, Dong Jin Kim. 9773-9782 [doi]
- HiScene: Creating Hierarchical 3D Scenes with Isometric View GenerationWenqi Dong, Bangbang Yang, Zesong Yang, Yuan Li, Tao Hu 0011, Hujun Bao, Yuewen Ma, Zhaopeng Cui. 9783-9792 [doi]
- SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian GarmentsRuiyan Wang, Zhengxue Cheng, Zonghao Lin, Jun Ling, Yuzhou Liu, Yanru An, Rong Xie 0004, Li Song 0001. 9793-9802 [doi]
- Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture RepresentationPinxin Liu, Pengfei Zhang, Hyeongwoo Kim, Pablo Garrido 0001, Ari Shapiro, Kyle Olszewski. 9803-9812 [doi]
- CitySculpt: 3D City Generation from Satellite Imagery with UV DiffusionXingbo Yao, Xuanmin Wang, Hui Xiong. 9813-9821 [doi]
- Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting ReconstructionXiufeng Huang, Ka-Chun Cheung, Runmin Cong, Simon See, Renjie Wan. 9822-9831 [doi]
- Category-Aware 3D Object Composition with Disentangled Texture and Shape Multi-view DiffusionZeren Xiong, Zikun Chen, Zedong Zhang, Xiang Li 0041, Ying Tai, Jian Yang 0003, Jun Li 0027. 9832-9841 [doi]
- AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion PriorGuoqiang Liang 0003, Qingnan Fan, Bingtao Fu, Jinwei Chen, Hong Gu, Lin Wang 0025. 9842-9851 [doi]
- ANT: Adaptive Neural Temporal-Aware Text-to-Motion ModelWenshuo Chen, Kuimou Yu, Haozhe Jia, Kaishen Yuan, Zexu Huang, Bowen Tian, Songning Lai, Hongru Xiao, Erhang Zhang, Lei Wang 0108, Yutao Yue. 9852-9861 [doi]
- CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image SynthesisAravindan Kamatchi Sundaram, Ujjayan Pal, Abhimanyu Chauhan, Aishwarya Agarwal, Srikrishna Karanam. 9862-9870 [doi]
- FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio GenerationYuxuan Jiang, Zehua Chen 0005, Zeqian Ju, Chang Li, Weibei Dou, Jun Zhu 0001. 9871-9880 [doi]
- Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image EditingBo Gao 0004, Jianhui Wang, Xinyuan Song 0002, Yangfan He, Fangxu Xing, Tianyu Shi 0003. 9881-9890 [doi]
- FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion SynthesisMengchao Wang, Qiang Wang, Fan Jiang, Yaqi Fan, Yunpeng Zhang, Yonggang Qi, Kun Zhao, Mu Xu. 9891-9900 [doi]
- Inversion-DPO: Precise and Efficient Post-Training for Diffusion ModelsZejian Li, Yize Li 0001, Chenye Meng, Zhongni Liu, Ling Yang 0006, Shengyuan Zhang, Guang Yang 0022, Changyuan Yang, Zhiyuan Yang, Lingyun Sun. 9901-9910 [doi]
- Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic GuidanceHongxing Fan, Lipeng Wang 0005, Haohua Chen, Zehuan Huang, Jiangtao Wu, Lu Sheng. 9911-9919 [doi]
- CausalCtrl: Causality-Aware Control Framework for Text-Guided Visual EditingHaoxiang Cao, Chaoqun Wang 0011, Yongwen Lai, Shaobo Min, Xuejin Chen. 9920-9929 [doi]
- DualMat: PBR Material Estimation via Coherent Dual-Path DiffusionYifeng Huang, Zhang Chen, Yi Xu, Minh Hoai, Zhong Li. 9930-9939 [doi]
- PLATO: Generating Objects from Part Lists via Synthesized LayoutsAmruta Muthal, Varghese P. Kuruvilla, Ravi Kiran Sarvadevabhatla. 9940-9949 [doi]
- EmIT: Emotional Interaction control in Text-to-image diffusion modelsHaofan Zhang, Shangfei Wang. 9950-9958 [doi]
- Text Prompted Spatiotemporal Sequence Prediction with Text-Vision Prompt Refiner and Masked Diffusion TransformersYechao Xu, Zhengxing Sun, Qian Li 0014, Yunhan Sun. 9959-9968 [doi]
- Bridging the Gap: Consistent Image Outpainting via Training-Free Noise OptimizationNa Li, Zihao Li 0005, Zuoli Tang, Yuqing Yu, Lixin Zou, Chenliang Li 0005. 9969-9977 [doi]
- MPPR: Memory-Prior-based Prompt Refinement in Continuous Space for Advanced Text-to-Image GenerationZhibing Zhang, Jiantao Lin, Cangqi Zhou, Rui Xia. 9978-9986 [doi]
- Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait GenerationWeipeng Tan, Chuming Lin, Chengming Xu 0001, Feifan Xu, Xiaobin Hu, Xiaozhong Ji, Junwei Zhu, Chengjie Wang, Yanwei Fu 0001. 9987-9995 [doi]
- GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph LayoutsJunwen He, Yifan Wang 0004, Lijun Wang, Huchuan Lu, Chenyang Li 0007, Hanyuan Chen, Jin-Peng Lan, Jun-Yan He, Bin Luo 0008, Yifeng Geng. 9996-10005 [doi]
- A Dual-Branch 3D Spatial-Aware Latent Diffusion for Realistic Depth Image SynthesisShuang Hao, Pengfei Ren 0001, Lei Zhang 0094, Haifeng Sun 0001, Pan Ting, Menghao Zhang 0004, Cong Liu 0046, Qi Qi 0001, Jianxin Liao, Jingyu Wang 0001. 10006-10014 [doi]
- Joint Holistic and Lesion Controllable Mammogram Synthesis via Gated Conditional Diffusion ModelXin Li 0001, Kaixiang Yang, Qiang Li 0018, Zhiwei Wang 0002. 10015-10023 [doi]
- SpeCa: Accelerating Diffusion Transformers with Speculative Feature CachingJiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Fei Ren, Shaobo Wang 0001, Kaixin Li, Linfeng Zhang 0001. 10024-10033 [doi]
- TASR: Timestep-Aware Diffusion Model for Image Super-ResolutionQinwei Lin, Xiaopeng Sun, Yu Gao, Yujie Zhong, Zheng Zhao, Dengjie Li, Haoqian Wang. 10034-10043 [doi]
- Stable Diffusion-Based Approach for Human De-OcclusionSeung Young Noh, Ju Yong Chang. 10044-10053 [doi]
- Enhancing Small-Scale Dataset Expansion with Triplet-Connection-based Sample Re-WeightingTing Xiang, Changjian Chen, Zhuo Tang, Qifeng Zhang, Fei Lyu 0007, Li Yang 0012, Jiapeng Zhang, Kenli Li 0001. 10054-10063 [doi]
- ObjCtrl: Object-based Control Relaxation for Conditional Text-to-Image GenerationXinlong Zhang, Zejian Li, Wei Li, Xiaoyu Zhang, Jia Wei, Chengyu Lin, Yongchuan Tang. 10064-10073 [doi]
- Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion GenerationSung-Lin Tsai, Bo-Lun Huang, Yu-Ting Shen, Cheng Yu Yeo, Chiang Tseng, Bo-Kai Ruan, Wen-Sheng Lien, Hong-Han Shuai. 10074-10082 [doi]
- MelodyEdit: Zero-shot Music Editing with Disentangled Inversion ControlHuadai Liu, Jialei Wang, Xiangtai Li, Wen Wang 0019, Qian Chen 0003, Rongjie Huang 0001, Yang Liu, Jiayang Xu, Zhou Zhao 0001, Wei Xue 0002. 10083-10092 [doi]
- Venus: Generating Large-scale mmWave Radar Data via Few 2D Videos for Gesture Recognition While Lying DownYue Ling, Dong Zhao 0001, Kaikai Deng, Kangwen Yin, Zixiao He, Yizong Wang, Huadong Ma. 10093-10102 [doi]
- 2: Unifying Omnidirectional Image Generation and Editing in an Omni ModelLiu Yang, Huiyu Duan, Yucheng Zhu, Xiaohong Liu 0001, Lu Liu 0005, Zitong Xu, Guangji Ma, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet. 10103-10112 [doi]
- Image Retargeting based on Text Region AwarenessGang Pan 0002, Meihua Liu, Lei Zhou, Jiahao Wang, Di Sun 0001. 10113-10121 [doi]
- Modeling and Identifying Distractors with Curriculum for Robust 3D Gaussian SplattingRuiqi Li, Yiu-ming Cheung. 10122-10131 [doi]
- Prompt-Softbox-Prompt: A Free-Text Embedding Control for Image EditingYitong Yang, Yinglin Wang, Tian Zhang, Jing Wang, Shuting He. 10132-10141 [doi]
- CoProSketch: Controllable and Progressive Sketch Generation with Diffusion ModelRuohao Zhan, Yijin Li, Yisheng He, Shuo Chen, Yichen Shen 0004, Xinyu Chen, Zilong Dong, Zhaoyang Huang, Guofeng Zhang 0001. 10142-10151 [doi]
- Text2Weight: Bridging Natural Language and Neural Network Weight SpacesBowen Tian, Wenshuo Chen, Zexi Li, Songning Lai, Jiemin Wu, Yutao Yue. 10152-10160 [doi]
- ThermVision: Exploring FLUX for Synthesizing Hyper-Realistic Thermal Face Data and Animations via Image to Video TranslationMuhammad Ali Farooq, Waseem Shariff, Peter Corcoran 0001. 10161-10170 [doi]
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance EditingJianhong Bai, Tianyu He, YuChi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu, Jiang Bian 0002. 10171-10180 [doi]
- Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature CachingZhixin Zheng, Xinyu Wang, Chang Zou, Shaobo Wang 0001, Linfeng Zhang 0001. 10181-10189 [doi]
- GraphSplat: Sparse-View Generalizable 3D Gaussian Splatting is Worth Graph of NodesZeyang Bai, Yunbiao Wang, Dongbo Yu, Jun Xiao 0005, Lupeng Liu. 10190-10199 [doi]
- MusFlow: Multimodal Music Generation via Conditional Flow MatchingJiahao Song, Yuzhao Wang. 10200-10209 [doi]
- PRINTER: Deformation-Aware Adversarial Learning for Virtual IHC Staining with In Situ FidelityYizhe Yuan, Bingsen Xue, Bangzheng Pu, Chengxiang Wang 0004, Cheng Jin 0005. 10210-10219 [doi]
- Generative Semantic Probing for Vision-Language Models via Hierarchical Feature OptimizationHe Wang, Longquan Dai, Shihao Pu, Shaomeng Wang, Jinhui Tang 0001. 10220-10228 [doi]
- FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow MatchingHui Wang 0075, Shujie Liu 0001, Lingwei Meng, Jinyu Li 0001, Yifan Yang 0005, Shiwan Zhao, Haiyang Sun 0004, Yanqing Liu, Haoqin Sun, Jiaming Zhou 0001, Yan Lu 0001, Yong Qin. 10229-10238 [doi]
- DEPO: Enhancing E-commerce Image Background Generation with Short Trajectory Direct Expected Preference OptimizationShikun Sun, Chengrui Wang, Min Zhou, Zixuan Wang 0026, Xiaoyu Qin, Tiezheng Ge, Bo Zheng 0007, Jia Jia 0001. 10239-10247 [doi]
- UniTalker: Conversational Speech-Visual SynthesisYifan Hu 0004, Rui Liu 0008, Yi Ren 0006, Xiang Yin 0006, Haizhou Li 0001. 10248-10257 [doi]
- Talking Head Generation via Viewpoint and Lighting Simulation Based on Global RepresentationBiao Dong, Lei Zhang. 10258-10267 [doi]
- Coding-Prior Guided Diffusion Network for Video DeblurringYike Liu, Jianhui Zhang, Haipeng Li 0001, Shuaicheng Liu, Bing Zeng 0001. 10268-10277 [doi]
- Spatial-Temporal Decomposition and Alignment in Controllable Video-to-Music GenerationWeitao You, Heda Zuo, Junxian Wu 0003, Dengming Zhang, Zhibin Zhou 0002, Lingyun Sun. 10278-10286 [doi]
- Exploring Palette based Color Guidance in Diffusion ModelsQianru Qiu, Jiafeng Mao, Xueting Wang. 10287-10295 [doi]
- StrandDesigner: Towards Practical Strand Generation with Sketch GuidanceNa Zhang, Moran Li, Chengming Xu 0001, Han Feng, Xiaobin Hu, Jiangning Zhang, Weijian Cao, Chengjie Wang, Yanwei Fu 0001. 10296-10304 [doi]
- InterAnimate: Taming Region-Aware Diffusion Model for Realistic Human Interaction AnimationYukang Lin, Yan Hong 0001, Zunnan Xu, Xindi Li, Chao Xu, Chuanbiao Song, Ronghui Li, Haoxing Chen, Jun Lan, Huijia Zhu, Weiqiang Wang 0002, Jianfu Zhang 0003, Xiu Li 0001. 10305-10314 [doi]
- Show and Polish: Reference-Guided Identity Preservation in Face Video RestorationWenKang Han, Wang Lin, Yiyun Zhou, Qi Liu, Shulei Wang, Chang Yao 0001, Jingyuan Chen. 10315-10324 [doi]
- Conducting Conditional Diffusion by Estimating the Mean Vector of von Mises-Fisher DistributionLongquan Dai, He Wang, Xiaolu Wei, Shaomeng Wang, Jinhui Tang 0001. 10325-10333 [doi]
- PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality AlignmentBin Wang, Yang Xu, Huan Zhao 0003, Hao Zhang 0139, Zixing Zhang 0001. 10334-10342 [doi]
- Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated CharactersNian Liu 0003, Zilong Zhang, Zi Wang, Tengyu Liu, Hongzhao Xie, Xinyi Tong 0001, Libin Liu 0002, Yaodong Yang 0001, Zhaofeng He 0002. 10343-10351 [doi]
- Noise-Optimized Distribution Distillation for Dataset CondensationTongfei Liu, Yufan Liu 0001, Bing Li 0001, Weiming Hu 0004, Yuming Li, Chenguang Ma. 10352-10360 [doi]
- FreeInsert: Personalized Object Insertion with Geometric and Style ControlYuhong Zhang, Han Wang, Yiwen Wang, Rong Xie 0004, Li Song 0001. 10361-10369 [doi]
- Frequency Regulation for Exposure Bias Mitigation in Diffusion ModelsMeng Yu, Kun Zhan. 10370-10378 [doi]
- RealText: Realistic Text Image Generation based on Glyph and Scene Aware InpaintingZihou Liu, Dongming Zhang, Jing Zhang, Jun Li, Yongdong Zhang 0001. 10379-10387 [doi]
- DLFR-VAE: Dynamic Latent Frame Rate VAE for Video GenerationZhihang Yuan, Siyuan Wang, Yuzhang Shang, Hanling Zhang, Tongcheng Fang, Rui Xie, Shengen Yan, Guohao Dai 0002, Yu Wang 0002. 10388-10397 [doi]
- Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition PerceptionJiajing Lin, Zhenzhong Wang, Dejun Xu, Shu Jiang, Yunpeng Gong, Min Jiang 0005. 10398-10407 [doi]
- AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature ReuseZichao Yu, Zhen Zou, Guojiang Shao, Chenwei Zhang, Shengze Xu, Jie Huang 0017, Feng Zhao 0004, Xiaodong Cun, Wenyi Zhang. 10408-10417 [doi]
- HumanPrinter: Reconstructing 3D Human from a Single Image Like a 3D PrinterLeyuan Liu 0001, Shen Chen 0005, Jingying Chen 0001. 10418-10426 [doi]
- Controllable Video-to-Music Generation with Multiple Time-Varying ConditionsJunxian Wu 0003, Weitao You, Heda Zuo, Dengming Zhang, Pei Chen 0005, Lingyun Sun. 10427-10436 [doi]
- FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion modelLingzhou Mu, Baiji Liu, Ruonan Zhang, Guiming Mo, Jiawei Jin, Kai Zhang 0012, Haozhi Huang 0001. 10437-10446 [doi]
- EDMG: Towards Efficient Long Dance Motion Generation with Fundamental Movements from Dance GenresJinming Zhang, Yunlian Sun, Hongwen Zhang 0001, Jinhui Tang 0001. 10447-10456 [doi]
- SAKR-Edit: Scene-Aware Knowledge Reasoning for Text-to-Image EditingJiawen Wang, Jianjun Li 0010, Zhiyuan Ma 0005, Ruixia Bai. 10457-10466 [doi]
- ModuleTeam: Open-Set Multi-Conditional Image Generation with Training-Free Latent Mixture of Any Control ModuleYuwei Zhou, Xin Wang 0019, Hong Chen 0011, Yipeng Zhang 0003, Zeyang Zhang 0001, Wenwu Zhu 0001. 10467-10475 [doi]
- SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video GenerationKien T. Pham 0001, Yingqing He, Yazhou Xing, Qifeng Chen 0001, Long Chen 0016. 10476-10485 [doi]
- AV-DiT: Taming Image Diffusion Transformers for Efficient Joint Audio and Video GenerationKai Wang 0012, Shijian Deng, Jing Shi 0005, Dimitrios Hatzinakos, Yapeng Tian. 10486-10495 [doi]
- From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language ModelsYuying Shang, Xinyi Zeng, Yutao Zhu 0001, Xiao Yang 0028, Zhengwei Fang, Jingyuan Zhang, Jiawei Chen, Zinan Liu, Yu Tian. 10496-10505 [doi]
- HairShifter: Consistent and High-Fidelity Video Hair Transfer via Anchor-Guided AnimationWangzheng Shi, Yinglin Zheng, Yuxin Lin, Jianmin Bao, Ming Zeng 0008, Dong Chen 0003. 10506-10515 [doi]
- Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball RotationYoungchan Choi, Hengfei Wang, Yihua Cheng, Boeun Kim, Hyung Jin Chang, Younggeun Choi 0001, Sang-Il Choi. 10516-10524 [doi]
- ACE: Concept Editing in Diffusion Models without Performance DegradationRuipeng Wang, Junfeng Fang, Jiaqi Li 0031, Hao Chen, Jie Shi, Kun Wang 0056, Xiang Wang 0010. 10525-10534 [doi]
- Immunizing Images from Text to Image Editing via Adversarial Cross-AttentionMatteo Trippodo, Federico Becattini, Lorenzo Seidenari. 10535-10543 [doi]
- PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based InpaintingHohyun Na, Seunghoo Hong, Simon S. Woo. 10544-10553 [doi]
- Exploring Adapter Design Tradeoffs for Low Resource Music GenerationAtharva Mehta, Shivam Chauhan, Monojit Choudhury. 10554-10562 [doi]
- SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Texture 3D Human ReconstructionGangjian Zhang, Jian Shu 0001, Nanjie Yao, Hao Wang 0094. 10563-10572 [doi]
- Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector DrawingsFeiwei Qin, Shichao Lu, Junhao Hou, Changmiao Wang, Meie Fang, Ligang Liu. 10573-10582 [doi]
- 2-Edit3DV: Diffusion-Guided Style Meets Structure for Consistent Multi-View 3D Video GenerationYuqi Chen, Xiubo Liang, Yu Zhao, Hongzhi Wang 0009, Weidong Geng. 10583-10592 [doi]
- See Through the Occlusions: Few-Shot Gaussian Splatting with Layered Amodal SupervisionGwon-Jung Kim, Du Yeol Lee, Jae Hong Yang, Chae-Eun Rhee. 10593-10601 [doi]
- Generating 3D Hair Strands from Images with Diverse Styles and ViewpointsPengyu Long, Zijun Zhao, Min Ouyang, Qingcheng Zhao, Wei Yang 0034, Lan Xu 0003, Jingyi Yu 0002. 10602-10611 [doi]
- Prior-Free Augmentation for Cloth-Changing Person Re-IdentificationJiajun Zhang, Xin Li 0034, Si Wu 0002, Yong Xu 0007, Yaowei Wang 0001. 10612-10621 [doi]
- MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and EditingXueyun Tian, Wei Li 0176, Bingbing Xu 0001, Yige Yuan, Yuanzhuo Wang, Huawei Shen. 10622-10631 [doi]
- Speech Token Prediction via Compressed-to-fine Language Modeling for Speech GenerationWenrui Liu 0003, Qian Chen 0003, Wen Wang 0019, Guanrou Yang, Weiqin Li, Minghui Fang 0002, Jialong Zuo, Xiaoda Yang, Tao Jin 0004, Jin Xu 0010, Zemin Liu, Yafeng Chen, Jionghao Bai, Zhifang Guo. 10632-10641 [doi]
- Sparse4DGS: Flow-Geometry Assisted 4D Gaussian Splatting for Dynamic Sparse View SynthesisDongdong Hu, Yang Zhou, Xiaofeng Huang, Haibing Yin, Zhu Li 0001. 10642-10651 [doi]
- Accelerating Diffusion Models via Parallel DenoisingYanming Chen 0002, Zixin Ma, Chuanguang Yang, Zhulin An, Yiwen Zhang 0001. 10652-10661 [doi]
- Robust Photo-Realistic Hand Gesture Generation: from Single View to Multiple ViewQifan Fu, Xu Chen 0030, Muhammad Asad 0001, Shanxin Yuan, Changjae Oh, Gregory G. Slabaugh. 10662-10670 [doi]
- DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio SynthesisWenJie Tian, Xinfa Zhu, Haohe Liu, Zhixian Zhao, Zihao Chen, Chaofan Ding, Xinhan Di, Junjie Zheng, Lei Xie 0001. 10671-10680 [doi]
- CADQ: Attribute-Consistent Face Cartoonization with Cross-modal Aligned and Deformable QuantizationYongjie Hu, Yifan Jiang, Ziyun Li 0002, Fei Gao 0006, Henrik Boström, Nannan Wang 0001. 10681-10689 [doi]
- Identity-Preserving Facial Aesthetic Enhancement via Hierarchical Prompt Learning and Pivotal TuningFangli Ying, Zhihong Zhang 0011, Liting Zhou, Cathal Gurrin, Jinhai Wang. 10690-10698 [doi]
- SyMuPe: Affective and Controllable Symbolic Music PerformanceIlya Borovik, Dmitrii Gavrilev, Vladimir Viro. 10699-10708 [doi]
- CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language ModelsZongsheng Cao, Yangfan He, Anran Liu, Jun Xie 0003, Zhepeng Wang 0002, Feng Chen 0044. 10709-10718 [doi]
- Learned Single-Pass Multitasking Perceptual Graphics for Immersive DisplaysDoga Yilmaz, He Wang, Towaki Takikawa, Duygu Ceylan, Kaan Aksit. 10719-10727 [doi]
- Temporal-Conditioned Symbolic Alignment for Controllable Text-to-Music GenerationZihao Zhang, Xingjiao Wu, Junjie Xu, Tianlong Ma, Tangren Yao, Wen Wu 0006, Liang He 0001. 10728-10737 [doi]
- Phys4DRT: Physics-based 4D Generation for Real-Time Interaction with Time-Frequency SupervisionYuntian Xiao, Shoulong Zhang, Zihang Zhang, Jiahao Cui 0001, Yan Wang, Shuai Li 0001. 10738-10747 [doi]
- EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text PromptingGuanrou Yang, Chen Yang, Qian Chen 0003, Ziyang Ma 0001, Wenxi Chen, Wen Wang 0019, Tianrui Wang, Yifan Yang 0005, Zhikang Niu, Wenrui Liu 0003, Fan Yu 0002, Zhihao Du, Zhifu Gao, Shiliang Zhang, Xie Chen 0001. 10748-10757 [doi]
- AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech GenerationJeongsoo Choi, Ji-Hoon Kim, Kim Sung-Bin, Tae Hyun Oh, Joon Son Chung. 10758-10767 [doi]
- Enhancing Diffusion Model Stability for Image Restoration via Gradient ManagementHongjie Wu, Mingqin Zhang, Linchao He, Ji-Zhe Zhou, Jiancheng Lv 0001. 10768-10777 [doi]
- FloorplanSBS: Synthesizing Vector Floorplans by Patch-Based Floorplan SegmentationWenming Wu 0001, Tianlei Sheng, Gaofeng Zhang, Liping Zheng. 10778-10786 [doi]
- NEXUS-O: An Omni-Perceptive and -Interactive Model for Language, Audio, and VisionChe Liu 0002, Yingji Zhang, Dong Zhang, Weijie Zhang, Chenggong Gong, Yu Lu, Shilin Zhou 0002, Ziliang Gan, Ziao Wang, Haipang Wu, Ji Liu 0003, André Freitas, Qifan Wang 0001, Zenglin Xu, Rongjunchen Zhang, Yong Dai 0001. 10787-10796 [doi]
- D-Judge: How Far Are We? Assessing the Discrepancies Between AI-synthesized and Natural Images through Multimodal GuidanceRenyang Liu 0001, Ziyu Lyu, Wei Zhou 0011, See-Kiong Ng. 10797-10806 [doi]
- TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming VideosLinli Yao, Yicheng Li, Yuancheng Wei, Lei Li 0039, Shuhuai Ren, Yuanxin Liu, Kun Ouyang, Lean Wang, Shicheng Li, Sida Li, Lingpeng Kong, Qi Liu 0049, Yuanxing Zhang, Xu Sun 0001. 10807-10816 [doi]
- DisMS-TS: Eliminating Redundant Multi-scale Features for Time Series ClassificationZhiPeng Liu, Peibo Duan, Binwu Wang, Xuan Tang, Qi Chu 0012, Changsheng Zhang 0001, Yongsheng Huang, Bin Zhang 0001. 10817-10826 [doi]
- EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion GenerationXiangyue Zhang, Jianfang Li 0001, Jiaxu Zhang, Jianqiang Ren, Liefeng Bo, Zhigang Tu 0001. 10827-10836 [doi]
- Advanced SpikingYOLOX: Extending Spiking Neural Network on Object Detection with Spike-based Partial Self-Attention and 2D-Spiking TransformerWei Miao, Jiangrong Shen, Hongming Xu 0002, Tommi Kärkkäinen, Qi Xu 0008, Yi Xu 0008, Fengyu Cong. 10837-10846 [doi]
- RadLAS: A Foundation Model for Interpretable Radiography Image Analysis with Lesion-Aware Self-Supervised Pre-trainingYihang Liu, Ying Wen, Longzhen Yang, Lianghua He, Heng Tao Shen. 10847-10856 [doi]
- Graph Unlearning Meets Influence-aware Negative Preference OptimizationQiang Chen, Zhongze Wu, Ang He, Xi Lin 0003, Shuo Jiang, Shan You, Chang Xu 0002, Yi Chen, Xiu Su. 10857-10866 [doi]
- HDCFN: Haze Distribution-aware Cross-modal Fusion Network for Infrared-guided Dense Haze Removal in UAVsJunwei Zhao, Qianchun Luo, Shiliang Zhang, Shen Gao, Jie Wu. 10867-10875 [doi]
- Incorporating the Refractory Period into Spiking Neural Networks through Spike-Triggered Threshold DynamicsYang Li, Xinyi Zeng, Zhe Xue, Pinxian Zeng, Zikai Zhang, Yan Wang. 10876-10885 [doi]
- SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian RepresentationsChunshi Wang, Hongxing Li, Yawei Luo. 10886-10895 [doi]
- Adaptive Graph Attention-Guided Parallel Sampling and Embedded Selection for Multi-Model FittingWenyu Yin, Shuyuan Lin, David Suter, Hanzi Wang. 10896-10904 [doi]
- Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing GraphsYaowen Hu, Wenxuan Tu, Yue Liu 0008, Miaomiao Li 0001, Wenpeng Lu, Zhigang Luo, Xinwang Liu 0002, Ping Chen. 10905-10914 [doi]
- FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial PriorsChenxi Li, Weijie Wang, Qiang Li 0048, Nicu Sebe, Bruno Lepri, Weizhi Nie. 10915-10924 [doi]
- EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward BackpropagationJiaqi Xu, Kunzhe Huang, Xinyi Zou, Yunkuo Chen, Bo Liu, Mengli Cheng, Jun Huang 0007, Xing Shi. 10925-10934 [doi]
- Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene UnderstandingWencan Huang, Daizong Liu, Wei Hu 0003. 10935-10944 [doi]
- Detecting Violations of Physical Common Sense in Images: A Challenge Dataset and Effective ModelWeibin Wu 0002, Zitong Wang 0007, Zhengjie Luo, Wenqing Chen, Zibin Zheng. 10945-10954 [doi]
- Manipulating Multimodal Agents via Cross-Modal Prompt InjectionLe Wang, Zonghao Ying, Tianyuan Zhang 0004, Siyuan Liang 0004, Shengshan Hu, Mingchuan Zhang, Aishan Liu, Xianglong Liu 0001. 10955-10964 [doi]
- LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language ModelsShangqing Tu, Yucheng Wang, Daniel Zhang-li, Yushi Bai, Jifan Yu, Yuhao Wu, Lei Hou 0001, Huiqin Liu, Zhiyuan Liu 0001, Bin Xu 0001, Juanzi Li. 10965-10974 [doi]
- TimesBERT: A BERT-Style Foundation Model for Time Series UnderstandingHaoran Zhang, Yong Liu, Yunzhong Qiu, Haixuan Liu, Zhongyi Pei, Jianmin Wang 0001, Mingsheng Long. 10975-10983 [doi]
- Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT ReasoningHongfei Xue, Yufeng Tang, Hexin Liu, Jun Zhang, Xuelong Geng, Lei Xie 0001. 10984-10993 [doi]
- Mavors: Multi-granularity Video Representation for Multimodal Large Language ModelYang Shi 0009, Jiaheng Liu, Yushuo Guan, Zhenhua Wu, Yuanxing Zhang, Zihao Wang, Weihong Lin, Jingyun Hua, Zekun Wang, Xinlong Chen, Bohan Zeng, Wentao Zhang 0001, Fuzheng Zhang, Wenjing Yang 0002, Di Zhang 0026. 10994-11003 [doi]
- SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG GenerationHanqi Chen, Zhongyin Zhao, Ye Chen 0006, Zhujin Liang, Bingbing Ni. 11004-11012 [doi]
- Merging-Resistant Watermarking for LoRA ModulesNa Zhao, Kejiang Chen, Yuang Qi, Kai Zeng, Weiming Zhang 0001, Nenghai Yu. 11013-11021 [doi]
- Multimodal Markup Document Models for Graphic Design CompletionKotaro Kikuchi, Ukyo Honda, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi. 11022-11031 [doi]
- Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided RefinementZhihan Zhang 0003, Yixin Cao 0002, Lizi Liao. 11032-11041 [doi]
- Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal LearningJiangrong Shen, Yulin Xie, Qi Xu 0008, Gang Pan 0001, Huajin Tang, Badong Chen. 11042-11051 [doi]
- FaceInsight: A Multimodal Large Language Model for Face PerceptionJingzhi Li 0002, Changjiang Luo, Ruoyu Chen 0001, Hua Zhang 0008, Wenqi Ren, Jianhou Gan, Xiaochun Cao. 11052-11061 [doi]
- Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMsChang Gao 0007, Kang Zhao, Runqi Wang, Jianfei Chen 0001, Liping Jing. 11062-11070 [doi]
- Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement LearningBaining Zhao, Ziyou Wang, Jianjie Fang, Chen Gao 0001, Fanhang Man, Jinqiang Cui, Xin Wang 0019, Xinlei Chen, Yong Li 0008, Wenwu Zhu 0001. 11071-11080 [doi]
- Mono3R: Exploiting Monocular Cues for Geometric 3D ReconstructionWenyu Li, Sidun Liu, Peng Qiao, Yong Dou. 11081-11090 [doi]
- SemGesture: Synthesizing Semantically Enhanced and Coherent GesturesPengsheng Liu, Zhaojie Chu, Xiaofen Xing, Xiangmin Xu. 11091-11100 [doi]
- CauRDG: Enhancing Domain Generalization with Causal-Driven Semantic Consistency ReasoningZongxin Liu, Yishu Liu 0002, Guangming Lu 0002, Xiaoling Luo, Bingzhi Chen. 11101-11110 [doi]
- TongGu-VL: Advancing Visual-Language Understanding in Chinese Classical Studies through Parameter Sensitivity-Guided Instruction TuningJiahuan Cao, Yang Liu 0353, Peirong Zhang 0001, Yongxin Shi, Kai Ding 0009, Lianwen Jin. 11111-11120 [doi]
- PRISM: A Benchmark for Unveiling Cross-modal Knowledge Inconsistency in Large Vision-Language ModelsMingjie Wei, Wei-Nan Zhang 0003, Chen Zhang, Yifeng Ding, Donglin Di, Lei Ren, Wei Chen 0089, Ting Liu 0001. 11121-11129 [doi]
- VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs InferencePengfei Jiang, Hanjun Li 0002, Linglan Zhao, Fei Chao 0001, Ke Yan, Shouhong Ding, Rongrong Ji. 11130-11139 [doi]
- LLM: Attribute Completion in Heterogeneous Graph with Integration of External Knowledge from Large Language ModelsZongxing Zhao, Shenzhi Yang, Xingkai Yao, Yuying Wang, Zhongqiu Chen, Xiaofang Zhang. 11140-11149 [doi]
- Foresail: LLM Sensor Knowledge Empowered Status-guided Network for Multivariate Time-series ClassificationYuhan Jing, Bo He 0003, Haifeng Sun 0001, Qi Qi 0001, Zirui Zhuang, Lei Zhang 0094, Jianxin Liao, Jingyu Wang 0001. 11150-11159 [doi]
- TabiMed: Tabularizing Medical Images for Few-Shot In-Context DiagnosisWanying Zhou, Yuqi Sun, Yu Ling, Zhen Xing, Chenxi Ma, Weimin Tan, Bo Yan. 11160-11169 [doi]
- SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented GenerationHao Ye, Mengshi Qi, Zhaohong Liu, Liang Liu 0001, Huadong Ma. 11170-11178 [doi]
- REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative DiagnosisDuy-Cat Can, Quang-Huy Tang, Huong Ha 0002, Binh T. Nguyen 0001, Oliver Y. Chén. 11179-11188 [doi]
- MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video AssessmentYanyun Pu, Kehan Li, Zeyi Huang, Zhijie Zhong, Kaixiang Yang. 11189-11198 [doi]
- Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language ModelsWanying Wang, Zeyu Ma 0003, Han Zheng, Xin Tan 0002, Mingang Chen. 11199-11208 [doi]
- ICAS: Detecting Training Data from Autoregressive Image Generative ModelsHongyao Yu, Yixiang Qiu, Yiheng Yang, Hao Fang 0011, Tianqu Zhuang, Jiaxin Hong, Bin Chen 0011, Hao Wu, Shu-Tao Xia. 11209-11217 [doi]
- SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe GenerationJiadong Pan, Liang Li 0003, Hongcheng Gao, Zheng-Jun Zha, Qingming Huang, Jiebo Luo 0001. 11219-11228 [doi]
- DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and ModelsJiachen Fu, Chun-Le Guo, Chongyi Li. 11229-11238 [doi]
- DepthDark: Robust Monocular Depth Estimation for Low-Light EnvironmentsLongjian Zeng, Zunjie Zhu, Rongfeng Lu, Ming Lu, Bolun Zheng, Chenggang Yan 0001, Anke Xue. 11239-11248 [doi]
- Bimodal Debiasing for Text-to-Image Diffusion: Adaptive Guidance in Textual and Visual SpacesLiu Yu 0001, Jiajun Sun, Ping Kuang, Rui Zhou, Fan Zhou 0002, Zhikun Feng. 11249-11258 [doi]
- Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial InspectionYilin Lu, Jianghang Lin, Linhuang Xie, Kai Zhao, Yansong Qu, Shengchuan Zhang, Liujuan Cao, Rongrong Ji. 11259-11268 [doi]
- Zero Matrix guided Adaptive Image Vaccine against Diffusion Model-based MultitaskYujiang Li, Zhili Zhou 0001, Ruohan Meng, Baowei Wang, Xiaojuan Wang, Cheng Qiao, Jiantao Zhou 0001. 11269-11278 [doi]
- Universally Unfiltered and Unseen: Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model SafeguardsSong Yan 0001, Hui Wei 0004, Jinlong Fei, Guoliang Yang 0005, Zhengyu Zhao 0001, Zheng Wang 0007. 11279-11287 [doi]
- FORGET ME: Federated Unlearning for Face Generation ModelsFan Qi, Ao Liu, Zixin Zhang 0004, Changsheng Xu. 11288-11297 [doi]
- MRED-14: A Benchmark for Low-Energy Residential Floor Plan Generation with 14 Flexible InputsPengyu Zeng, Jun Yin, Haoyuan Sun, Yuqin Dai, Maowei Jiang, Miao Zhang, Shuai Lu. 11298-11307 [doi]
- Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal ManipulationsJinjie Shen, Yaxiong Wang, Lechao Cheng, Nan Pu, Zhun Zhong. 11308-11317 [doi]
- Knowledge Negative Distillation: Circumventing Overfitting to Unlock More Generalizable Deepfake DetectionJipeng Liu, Haichao Shi, Yaru Zhang, Xiao-Yu Zhang. 11318-11327 [doi]
- ICE: Intercede Concept Erasure in Text-to-Image Diffusion ModelsYizhou Lin, Nisha Huang, Kaer Huang, Henglin Liu, Yiqiang Yan, Jie Guo, Tong-Yee Lee, Xiu Li 0001. 11328-11336 [doi]
- Revisiting Data Auditing in Large Vision-Language ModelsHongyu Zhu 0004, Sichu Liang, Wenwen Wang, Boheng Li, Tongxin Yuan, Fangqi Li, Hanyi Wang, Shi-Lin Wang, Zhuosheng Zhang 0001. 11337-11346 [doi]
- CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion ModelsShunchang Liu, Zhuan Shi, Lingjuan Lyu, Yaochu Jin, Boi Faltings. 11347-11356 [doi]
- Differentially Private Visual Learning with Public Subspace Augmented by Synthetic DataHaichao Sha, Yuncheng Wu, Ruixuan Liu, Yang Cao 0011, Hong Chen 0001. 11357-11366 [doi]
- Detecting Synthetic Image by Cross-Modal Commonality InteractionKai Li, Wenqi Ren, Wei Wang 0335, Linchao Zhang, Xiaochun Cao. 11367-11375 [doi]
- Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted PerturbationsNaresh Kumar Devulapally, Shruti Agarwal, Tejas Gokhale, Vishnu Suresh Lokhande. 11376-11384 [doi]
- Safe-BVAR: Text-to-Image Generative Watermarking for Bitwise Visual AutoRegressive ModelShengjiu Dai, Xiujian Liang, Sheng Li 0006, Zhenxing Qian, Xinpeng Zhang 0001. 11385-11393 [doi]
- RealHD: A High-Quality Dataset for Robust Detection of State-of-the-Art AI-Generated ImagesHanzhe Yu, Yun Ye, Jintao Rong, Qi Xuan, Chen Ma 0003. 11394-11403 [doi]
- Enhancing Democratic Mediation through Norm-Awareness in Generative Agent SocietiesTianjiao Xu, Hao Fu, Suiyang Zhang, Jianhua Yin, Tian Gan 0002, Liqiang Nie. 11404-11413 [doi]
- Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model SecurityMuzhi Dai, Shixuan Liu, Zhiyuan Zhao 0005, Junyu Gao 0001, Hao Sun 0038, Xuelong Li 0001. 11414-11423 [doi]
- Rethinking Individual Fairness in Deepfake DetectionAryana Hou, Li Lin, Justin Li, Shu Hu 0001. 11424-11433 [doi]
- Towards Culturally Fair Multimodal Generation: Quantifying and Mitigating Orientalist Biases in Text-to-Visual ModelsYifan Zeng, Fangzhou Dong, Jian Zhao 0013, Peijia Zheng, Jian Li, Huiyu Zhou. 11434-11442 [doi]
- MaXsive: High-Capacity and Robust Training-Free Generative Image Watermarking in Diffusion ModelsPoyuan Mao, Cheng-Chang Tsai, Chun-Shien Lu. 11443-11452 [doi]
- Mitigating Stereotypes in Text-to-Image Generation: A Novel Perspective of Selective Neural SuppressionJunlei Zhou, Jiashi Gao, Xinwei Guo, Haiyan Wu, Quanying Liu, Xiangyu Zhao 0001, Hongxin Wei, Xin Yao 0001, Xuetao Wei. 11453-11461 [doi]
- NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image GenerationYitong Sun 0002, Yao Huang, Ruochen Zhang, Huanran Chen, Shouwei Ruan, Ranjie Duan, Xingxing Wei 0001. 11462-11471 [doi]
- SCOL: Style Code Orchestration in Latent Space for Proactive Face-Swapping DefenseEungi Lee, Jae Hyun Yoon, Seok Bong Yoo. 11472-11481 [doi]
- Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model DistributionZhicheng Zhang, Peizhuo Lv, Mengke Wan, Jiang Fang, Diandian Guo, Yezeng Chen, YinLong Liu, Wei Ma, Jiyan Sun, Liru Geng. 11482-11491 [doi]
- K-Space Bispectrum Steganography for Robust Unlearnable DataJiahao Li 0007, Yiqiang Chen 0001, Yunbing Xing, Yang Gu 0001, Xiangyuan Lan. 11492-11501 [doi]
- Choose Your Expert: Uncertainty-Guided Expert Selection for Continual Deepfake DetectionXueyi Zhang 0004, Peiyin Zhu, Jinping Sui, Xiaoda Yang, Jiahe Tian, Mingrui Lao, Siqi Cai 0002, Yanming Guo, Jun Tang 0001. 11502-11511 [doi]
- ExDA: Towards Universal Detection and Plug-and-Play Attribution of AI-Generated Ex-Regulatory ImagesWenpeng Mu, Zheng Li, Qiang Xu 0007, Xinghao Jiang, Tanfeng Sun. 11512-11521 [doi]
- PiMMNet: Introducing Multi-Modal Precipitation Nowcasting via a Physics-informed PerspectiveDemin Yu, Wenchuan Du, Kenghong Lin, Xutao Li 0001, Yunming Ye, Chuyao Luo, Xunlai Chen. 11522-11531 [doi]
- Appearance Contrasts for Unconstrained Age EstimationJilong Wei, Yangyang Hu, Xiangjuan Wu, Yiqiang Wu, Hao Liu 0019. 11532-11541 [doi]
- Audio-Visual Asynchrony Mitigation: Cross-Modal Alignment and Feature Reconstruction for Deepfake DetectionYan Wang 0088, Qindong Sun, Dongzhu Rong. 11542-11551 [doi]
- From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous DrivingXinyu Xia 0002, Xingjun Ma, Yunfeng Hu 0003, Ting Qu, Hong Chen 0003, Xun Gong 0007. 11552-11561 [doi]
- Diffusion-based Adversarial Identity Manipulation for Facial Privacy ProtectionLiqin Wang, Qianyue Hu, Wei Lu 0001, Xiangyang Luo 0001. 11562-11571 [doi]
- Towards Open-world Generalized Deepfake Detection: General Feature Extraction via Unsupervised Domain AdaptationMidou Guo, Qilin Yin, Wei Lu 0001, Xiangyang Luo 0001. 11572-11580 [doi]
- A Multimodal Deviation Perceiving Framework for Weakly-Supervised Temporal Forgery LocalizationWenbo Xu, Junyan Wu, Wei Lu 0001, Xiangyang Luo 0001, Qian Wang 0002. 11581-11589 [doi]
- Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Model WatermarkingCong Kong, Rui Xu, Jiawei Chen, Zhaoxia Yin. 11590-11599 [doi]
- Generalizable Audio Deepfake Detection via Risk-Aware Style Alignment and Structural Empirical Risk MinimizationMingru Yang, Yanmei Gu, Qianhua He, Peirong Zhang, Haolin He, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang 0002. 11600-11609 [doi]
- WhiADD: Semantic-Acoustic Fusion for Robust Audio Deepfake DetectionJianqiao Cui, Bingyao Yu, Qihao Wang, Fei Meng, Jiwen Lu. 11610-11618 [doi]
- SiFMimicEvader: Evading Fake Voice Detection with Adversarial Neural Mimicry AttacksXuan Hai, Xin Liu 0050, Zihao Zhang, Ziyao Yu, Xiangzhen Kong, Song Li 0006, Weina Niu, Rui Zhou 0005, Qingguo Zhou. 11619-11628 [doi]
- AdvPainting: Clean-text Jailbreaking Against Inpainting ModelsBingqian Zhou, Zhihao Wu, Yushi Cheng, Wenyuan Xu 0001. 11629-11637 [doi]
- Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice DeepfakesZhou Feng, Jiahao Chen, Chunyi Zhou 0001, Yuwen Pu, Qingming Li, Tianyu Du, Shouling Ji. 11638-11647 [doi]
- Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection AttacksYurun Chen 0002, Xueyu Hu, Keting Yin, Juncheng Li 0006, Shengyu Zhang 0001. 11648-11656 [doi]
- Analytic Synaptic Dynamic Scaling Balancer for Multimodal Deepfake Continual DetectionMan Xiao, Jianbin Ye, Bo Liu 0014, Zijian Gao, Kele Xu, Xiaodong Wang. 11657-11666 [doi]
- SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake DetectionInzamamul Alam, Md Tanvir Islam, Simon S. Woo. 11667-11676 [doi]
- ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs: ShieldVLMShiyao Cui, Qinglin Zhang, Xuan Ouyang, Renmiao Chen, Zhexin Zhang, Yida Lu, Hongning Wang, Han Qiu 0001, Minlie Huang. 11677-11686 [doi]
- Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models?Yunbo Lyu, Zhou Yang 0003, Yuqing Niu, Jing Jiang, David Lo 0001. 11687-11696 [doi]
- DFPD: Dual-Forgery Proactive Defense against Both Deepfakes and Traditional Image ManipulationsBeijing Chen, Yuting Hong, Ziqiang Li 0001, Zhangjie Fu. 11697-11705 [doi]
- Evaluating and Mitigating Sycophancy in Large Vision-Language ModelsJiayi Gao, Huaiwen Zhang. 11706-11715 [doi]
- From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert UsersShahroz Tariq, Simon S. Woo, Priyanka Singh, Irena Irmalasari, Saakshi Gupta, Dev Gupta. 11716-11725 [doi]
- Frequency-aware Correlation Discovering and Spatial Forgery Clue Distilling for Synthetic Image DetectionJiehua Zhang, Liang Li 0003, Chenggang Yan 0001, Wei Ke 0003, Yihong Gong. 11726-11735 [doi]
- ALLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake DetectionHao Gu, Jiangyan Yi, Chenglong Wang 0001, Jianhua Tao 0001, Zheng Lian 0004, Jiayi He, Yong Ren, Yujie Chen 0006, Zhengqi Wen. 11736-11745 [doi]
- RAIDX: A Retrieval-Augmented Generation and GRPO Reinforcement Learning Framework for Explainable Deepfake DetectionTianxiao Li, Zhenglin Huang, Haiquan Wen, Yiwei He, Shuchang Lyu, Baoyuan Wu, Guangliang Cheng. 11746-11755 [doi]
- JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual SteeringRenmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, Qinglin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, Minlie Huang. 11756-11765 [doi]
- Multi-level SSL Feature Gating for Audio Deepfake DetectionHoan My Tran, Damien Lolive, Aghilas Sini, Arnaud Delhay, Pierre-François Marteau, David Guennec. 11766-11775 [doi]
- Toxicity Begets Toxicity: Unraveling Conversational Chains in Political PodcastsNaquee Rizwan, Nayandeep Deb, Sarthak Roy, Vishwajeet Singh Solanki, Kiran Garimella 0001, Animesh Mukherjee 0001. 11776-11784 [doi]
- Tackling Device Data Distribution Real-time Shift via Prototype-based Parameter EditingZheqi Lv, Wenqiao Zhang, Kairui Fu, Qi Tian 0001, Shengyu Zhang 0001, Jiajie Su, Jingyuan Chen, Kun Kuang, Fei Wu 0001. 11785-11794 [doi]
- Configuring Dynamic Multi-Stage Serverless Pipelines for Video Processing with Minimal Profiling OverheadJiaye Zhang, Hongyi Wang, Peiru Yang, Zili Meng, Mingwei Xu 0001. 11795-11804 [doi]
- Decode-What-Matters: Frame-Level Parallel Generative Decoding to Accelerate Large-Scale Video AnalyticsXiaokun Wang 0002, Yuting Yan, Sheng Zhang 0001, Andong Zhu 0001, Ning Chen 0010, Yu Chen 0038, Zhuzhong Qian, Sanglu Lu, Yu Liang 0001. 11805-11814 [doi]
- ViTraj: Learning Dual-Side Representations for Vehicle-Infrastructure Cooperative Trajectory PredictionShengzhe You, Libo Weng, Fei Gao 0014. 11815-11824 [doi]
- A Satellite-Ground Synergistic Large Vision-Language Model System for Earth ObservationYuxin Zhang, Jiahao Yang, Zhe Chen 0015, Wenjun Zhu, Jin Zhao 0001, Yue Gao 0001. 11825-11833 [doi]
- Overfitted Point Cloud Attribute Codec Using Sparse Hierarchical Implicit Neural RepresentationsZhe Sun, Qiang Xu, Qi Zhang 0029, Shan Liu 0001, Ge Li 0002. 11834-11843 [doi]
- DynFed: Adaptive Federated Learning via Quantization-Aware Knowledge DistillationNan He, Yiming Chen, Zheng Jiang, Song Yang 0002, Lifeng Sun. 11844-11852 [doi]
- ASTER: Adaptive Dynamic Layer-Skipping for Efficient Transformer Inference via Markov Decision ProcessFangxin Liu, Junjie Wang, Ning Yang, Zongwu Wang, Junping Zhao, Li Jiang 0002, Haibing Guan. 11853-11861 [doi]
- Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exitYang Zhao 0002, Shusheng Li, Xueshang Feng. 11862-11870 [doi]
- Gaze-Adaptive Foveation for Remote Rendered VRAdhi Widagdo, Teemu Kämäräinen, Ahmad Alhilal, Matti Siekkinen, Cheng-Hsin Hsu. 11871-11879 [doi]
- T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge RetrievalDong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, Jianxing Liu. 11880-11889 [doi]
- Device-Cloud Collaborative Learning Framework for Efficient Unknown Object DetectionKewei Zhao, Xiaowei Hu, Qinya Li. 11890-11898 [doi]
- SplatPose: On-Device Outdoor AR Pose Estimation Using Gaussian SplattingWeiwu Pang, Rajrup Ghosh, Jiawei Yang, Ziyu Wei, Branden Leong, Yue Wang, Ramesh Govindan. 11899-11908 [doi]
- FCG: High-Throughput JPEG Heterogeneous Inference with Hybrid Parallel Pipeline on Mobile DevicesYoubo Mao, Ziyang Kang, Pengfei Li, Jiyao Chen, Zenglin Yang, Zhijun Li. 11909-11917 [doi]
- GraphWorld: Ultra-fast Graph Engine for World-Wide Web SearchingXinbiao Gan, Qiang Zhang 0053, Tiejun Li, Chunye Gong, Kai Lu 0001. 11918-11927 [doi]
- BS3: Bézier Slicing Middleware for 3D Mesh LOD OptimizationLehao Lin, Baohua Fang, Ziheng Sun, Ke Wang 0013, Hong Kang, Wei Cai 0002. 11928-11936 [doi]
- Quantization Meets OOD: Generalizable Quantization-aware Training from a Flatness PerspectiveJiacheng Jiang, Yuan Meng, Chen Tang, Han Yu, Qun Li, Zhi Wang 0001, Wenwu Zhu 0001. 11937-11946 [doi]
- FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated LearningXinhai Yan, Libing Wu, Zhuangzhuang Zhang, Bingyi Liu, Lijuan Huo, Jing Wang. 11947-11955 [doi]
- Towards Effective Open-set Graph Class-incremental LearningJiazhen Chen, Zheng Ma 0011, Sichao Fu, Mingbin Feng, Tony S. Wirjanto, Weihua Ou. 11956-11965 [doi]
- Multi-Width Neural Network-Assisted Hierarchical Federated Learning in Heterogeneous Cloud-Edge-Device ComputingHaizhou Wang, Guobing Zou, Fei Xu, Yangguang Cui, Tongquan Wei. 11966-11975 [doi]
- Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory VideosSarmistha Das, R. E. Zera Marveen Lyngkhoi, Sriparna Saha 0001, Alka Maurya. 11976-11985 [doi]
- PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video StreamingChunyu Qiao, Tong Liu, Yucheng Zhang, Zhiwei Fan, Pengjin Xie, Zhen Wang, Liang Liu 0001. 11987-11995 [doi]
- Generalizing to New Area: Self-Distillation Curriculum Learning for Fine-Grained Cross View LocalizationFenghao Tian, Mingtao Feng, Jianqiao Luo, Zijie Wu, Longlong Mei, Lijie Yang, Weisheng Dong, Yaonan Wang 0001. 11996-12005 [doi]
- How2Compress: Scalable and Efficient Edge Video Analytics via Adaptive Granular Video CompressionYuheng Wu 0006, Thanh Tung Nguyen, Lucas Liebe, Quang Tau, Pablo Espinosa Campos, Jinghan Cheng, Dongman Lee. 12006-12015 [doi]
- Neural Video Compression with In-Loop Contextual Filtering and Out-of-Loop Reconstruction EnhancementYaojun Wu 0001, Chaoyi Lin, Yiming Wang, Semih Esenlik, Zhaobin Zhang, Kai Zhang 0007, Li Zhang 0006. 12016-12024 [doi]
- VidIQ: Inference-Aware Neural Codecs for Quality-Enhanced, Real-Time Video AnalyticsAndong Zhu 0001, Sheng Zhang 0001, Xiaohang Shi 0001, Hesheng Sun, Yu Liang 0001, Zhuzhong Qian, Han Zheng, Xiaokun Wang 0002, Ning Jiang. 12025-12034 [doi]
- Beyond Interpretability: Exploring the Comprehensibility of Adaptive Video Streaming through Large Language ModelsLianchen Jia, Chaoyang Li 0002, Ziqi Yuan, Jiahui Chen, Tianchi Huang, Jiangchuan Liu, Lifeng Sun. 12035-12044 [doi]
- Progressive Learning with Human Feedback for Personalized Adaptive Video StreamingZhaohui Jiang, Xuening Feng, Tianchi Huang, Ruixiao Zhang, Paul Weng, Yifei Zhu. 12045-12053 [doi]
- Eyes on the Road, Mind Beyond Vision: Context-Aware Multi-modal Enhanced Risk AnticipationJiaxun Zhang, Haicheng Liao, Yumu Xie, Chengyue Wang 0001, Yanchen Guan, Bin Rao, Zhenning Li 0001. 12054-12063 [doi]
- 2VS: Progressive Partition-Based Volumetric Video Streaming under Network DynamicsJingrou Wu, Haoxian Liu, Jin Zhang 0001, Dan Wang 0002, Jing Jiang 0002. 12064-12073 [doi]
- Congestion Control for VR Cloud Gaming: Integration and Comparison in Real VR Gaming EnvironmentAhmad Alhilal, Ze Wu 0006, Teemu Kämäräinen, Tristan Braud, Matti Siekkinen. 12074-12082 [doi]
- EHVC: Efficient Hierarchical Reference and Quality Structure for Neural Video CodingJunqi Liao, Yaojun Wu 0001, Chaoyi Lin, Zhipin Deng, Li Li 0040, Dong Liu 0002, Xiaoyan Sun 0001. 12083-12091 [doi]
- Efficient Semantic Codec for Real-time Vibrotactile TransmissionRunjie Wang, Kemi Chen, Shuijie Li, Mingkai Chen 0001, Tiesong Zhao. 12092-12101 [doi]
- INDS: Incremental Named Data Streaming for Real-Time Point Cloud VideoRuonan Chai, Yixiang Zhu, Xinjiao Li, Jiawei Li, Zili Meng, Dirk Kutscher. 12102-12110 [doi]
- RUN: A Case for Cross-Layer Networked Virtual RealityYufeng Chen, Umakant Kulkarni, Voicu Popescu, Sonia Fahmy. 12111-12120 [doi]
- Watch, Skip, Repeat: Hotspot-Aware Joint Optimization for Video StreamingDaoxu Sheng, Qi Qi 0001, Jingyu Wang 0001, Jianxin Liao. 12121-12130 [doi]
- Themis: Toward Stable Near-Zero Queuing Delay in Congestion Control for Low-Latency Interactive Video StreamingFeida Liu, Yifan Wang, Jiaqi Zheng 0001, Boxi Liu, Guihai Chen. 12131-12139 [doi]
- Let Your Video Listen to Your Music! - Beat-Aligned, Content-Preserving Video Editing with Arbitrary MusicXinyu Zhang 0017, Dong Gong, Zicheng Duan, Anton van den Hengel, Lingqiao Liu. 12140-12149 [doi]
- Compliance Rating Scheme: A Data Provenance Framework for Generative AI DatasetsMatyas Bohacek, Ignacio Vilanova Echavarri. 12150-12159 [doi]
- A Novel Perspective on Low-Light Image Enhancement: Leveraging Artifact Regularization and Walsh-Hadamard TransformWeilin Wu, Shifan Yang, Qizhao Lin, Xinghong Chen, Kunping Yang, Jing Wang, Guannan Chen. 12160-12169 [doi]
- Teaching AI to Feel: A Collaborative, Full-Body Exploration of Emotive CommunicationEsen K. Tütüncü, Lissette Lemus, Kris Pilcher, Holger Sprengel, Jordi Sabater-Mir. 12170-12178 [doi]
- Towards Modality Generalization: A Benchmark and Prospective AnalysisXiaohao Liu, Xiaobo Xia, Zhuo Huang, See-Kiong Ng, Tat-Seng Chua. 12179-12188 [doi]
- MarkSplatter: Generalizable Watermarking for 3D Gaussian Splatting Model via Splatter Image StructureXiufeng Huang, Ziyuan Luo, Qi Song 0003, Ruofei Wang, Renjie Wan. 12189-12198 [doi]
- Align 3D Representation and Text Embedding for 3D Content PersonalizationQi Song 0003, Ziyuan Luo, Ka-Chun Cheung, Simon See, Renjie Wan. 12199-12208 [doi]
- Financial Models meets Generative Art: Black-Scholes-Inspired Concept Blending in Text-to-Image DiffusionDivya Kothandaraman, Ming C. Lin, Dinesh Manocha. 12209-12217 [doi]
- Generative Flow Networks for Personalized Multimedia Systems: A Case Study on Short Video FeedsYili Jin 0001, Ling Pan, Rui-Xiao Zhang, Jiangchuan Liu, Xue Liu 0001. 12218-12226 [doi]
- Singing Timbre Popularity Assessment Based on Multimodal Large Foundation ModelZihao Wang, Ruibin Yuan, Ziqi Geng, Hengjia Li, Xingwei Qu, Xinyi Li, Songye Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang. 12227-12236 [doi]
- Generative AI for Multimedia Communication: Recent Advances, An Information-Theoretic Framework, and Future OpportunitiesYili Jin 0001, Xue Liu 0001, Jiangchuan Liu. 12237-12246 [doi]
- MLLMs Meet Person Re-identificationMengying Duan, He Li 0054, Mang Ye. 12247-12256 [doi]
- T2UE: Generating Unlearnable Examples from Text DescriptionsXingjun Ma, Hanxun Huang, Tianwei Song, Ye Sun, Yifeng Gao 0002, Yu-Gang Jiang 0001. 12257-12265 [doi]
- Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMsYudong Zhang 0008, Ruobing Xie, Yiqing Huang, Jiansheng Chen 0001, Xingwu Sun, Zhanhui Kang, Di Wang 0052, Yu Wang 0002. 12266-12275 [doi]
- GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric MediaEmanuele Artioli, Daniele Lorenzi, Shivi Vats, Farzad Tashtarian, Christian Timmerer. 12276-12284 [doi]
- Towards a Global Spatial-Temporal Food Memory: A Vision for Privacy-Preserving Collaborative Multimedia AnalysisZhihao Hao, Bob Zhang 0001, Haisheng Li. 12285-12294 [doi]
- Towards a Universal Query Representation for Multimodal Information RetreivalLuca Rossetto, Heiko Schuldt, Ralph Gasser. 12295-12303 [doi]
- Specify Privacy Yourself: Assessing Inference-Time Personalized Privacy Preservation Ability of Large Vision-Language ModelsXingqi Wang 0003, Xiaoyuan Yi, Xing Xie 0001, Jia Jia 0001. 12304-12313 [doi]
- Learning New Concepts, Remembering the Old: Continual Learning for Multimodal Concept Bottleneck ModelsSongning Lai, Mingqian Liao, Zhangyi Hu, Jiayu Yang, Wenshuo Chen, Hongru Xiao, Jianheng Tang 0001, Haicheng Liao, Yutao Yue. 12314-12322 [doi]
- Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM EvaluationYichi Zhang 0009, Zhuo Chen 0007, Lingbing Guo, Yajing Xu, Min Zhang, Wen Zhang 0015, Huajun Chen. 12323-12332 [doi]
- HEALTH+: Empowering Individuals via Unifying Health DataSujaya Maiyya, Shantanu Sharma 0001, Avinash Kumar. 12333-12341 [doi]
- LDW: Label Divergence Weighting for Multimodal Sentiment AnalysisQuanqi Du, Loic De Langhe, Els Lefever, Véronique Hoste. 12342-12351 [doi]
- Physics-Informed Representation Alignment for Sparse Radio-Map ReconstructionHaozhe Jia, Wenshuo Chen, Zhihui Huang, Lei Wang 0108, Hongru Xiao, Nanqian Jia, Keming Wu, Songning Lai, Bowen Tian, Yutao Yue. 12352-12360 [doi]
- The Birth of Vision LanguageAminul Islam, Md Mustakin Alam, Shaker Islam. 12361-12370 [doi]
- A Matter of Time: Revealing the Structure of Time in Vision-Language ModelsNidham Tekaya, Manuela Waldner, Matthias Zeppelzauer. 12371-12380 [doi]
- One Size Fits All? A Modular Adaptive Sanitization Kit (MASK) for Customizable Privacy-Preserving Phone Scam DetectionKangzhong Wang, Zitong Shen, Youqian Zhang, MK Michael Cheung, Xiapu Luo, Grace Ngai, Eugene Yujun Fu. 12381-12389 [doi]
- Domain-Agnostic Neural Oil Painting via Normalization Affine Test-Time AdaptationQichao Dong, Lingyu Liu, Yaxiong Wang, Jason J. R. Liu, Zhedong Zheng. 12390-12398 [doi]
- MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations TheoryAna Carolina Condez, Diogo Tavares, João Magalhães. 12399-12408 [doi]
- MindFuse: Towards GenAI Explainability in Marketing Strategy Co-CreationAleksandr Farseev, Marlo Ongpin, Qi Yang 0005, Ilia Gossoudarev, Yu-Yi Chu-Farseeva, Sergey I. Nikolenko. 12409-12418 [doi]
- Multifractal Comparison of Billboard and AI-Generated MusicKevin Kailun Zhang, Ying Sun, Hui Xiong. 12419-12427 [doi]
- From Hemoglobin to MOS: Towards Neuro-Based QoE Assessment Using fNIRSNatalia Jakubiec, Lucjan Janowski. 12428-12436 [doi]
- Re-examining Concept-based Explainable Models for Multimodal Interpretative TasksJulie Tores, Elisa Ancarani, Rémy Sun, Lucile Sassatelli, Hui-Yin Wu, Frédéric Precioso. 12437-12445 [doi]
- Pre-Forgettable Models: Prompt Learning as a Native Mechanism for UnlearningRutger Hendrix, Giovanni Patanè, Leonardo G. Russo, Simone Carnemolla, Federica Proietto Salanitri, Giovanni Bellitto, Concetto Spampinato, Matteo Pennisi. 12446-12454 [doi]
- RQ-Rec: Residual Quantized Hierarchical Preference Modeling for Cross-Domain RecommendationYingjun Dai, Ahmed El-Roby. 12455-12463 [doi]
- Can Audio Language Models Listen Between the Lines? A Study on Metaphorical Reasoning via UnspokenHongru Xiao, Xiang Li 0064, Duyi Pan, Longfei Zhang, ZhixueSong ZhixueSong, Jiale Han, Songning Lai, Wenshuo Chen, Jing Tang, Benyou Wang. 12464-12472 [doi]
- A Data-driven Approach to the Longitudinal Study of Canine Vocal Pattern DevelopmentHridayesh Lekhak, Tuan M. Dang, Theron S. Wang, Kenny Q. Zhu. 12473-12482 [doi]
- Agent-to-Agent (A2A) Protocol Integrated Digital Twin System with AgentIQ for Multimodal AI Fitness Coaching and Personalized Well-BeingKamran Gholizadeh HamlAbadi, Monica (Monireh) Vahdati, Fedwa Laamarti, Abdulmotaleb El-Saddik. 12483-12491 [doi]
- Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task PlanningXun Li, Rodrigo Santa Cruz, Mingze XI, Hu Zhang 0005, Madhawa Perera, Ziwei Wang, Ahalya Ravendran, Brandon J. Matthews, Feng Xu, Matt Adcock, Dadong Wang, Jiajun Liu. 12492-12500 [doi]
- Ensuring Responses Contain Appropriate Images: Timing Judgment for Multimodal ResponsesHao Yang 0066, Tian Zheng, Yanyan Zhao, Bing Qin 0001. 12501-12508 [doi]
- Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment RetrievalXiang Fang, Wanlong Fang, Wei Ji 0008, Tat-Seng Chua. 12509-12518 [doi]
- AirScape: An Aerial Generative World Model with Motion ControllabilityBaining Zhao, Rongze Tang, Mingyuan Jia, Ziyou Wang, Fanhang Man, Xin Zhang 0123, Yu Shang, Weichen Zhang, Wei Wu 0021, Chen Gao 0001, Xinlei Chen, Yong Li 0008. 12519-12528 [doi]
- TinyServe: Query-Aware Cache Selection for Efficient LLM ServingDong Liu, Yanxuan Yu. 12529-12537 [doi]
- Robustness as Architecture: Designing IQA Models to Withstand Adversarial PerturbationsIgor Meleshin, Anna Chistyakova, Anastasia Antsiferova, Dmitriy S. Vatolin. 12538-12546 [doi]
- MAGNeT: Multimodal Adaptive Gaussian Networks for Intent Inference in Moving Target Selection across Complex ScenariosXiangxian Li, Yawen Zheng, Baiqiao Zhang, Yijia Ma, Xianhui Cao, Juan Liu 0008, Yulong Bian, Jin Huang 0009, Chenglei Yang. 12547-12555 [doi]
- OinkTrack: An Ultra-Long-Term Dataset for Multi-Object Tracking and Re-Identification of Group-Housed PigsFeng-Kai Huang, Hong-wei Xu, Chu-Chuan Lee, Hong-Yi Tu, Hong-Han Shuai, Wen-Huang Cheng. 12556-12563 [doi]
- SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech SynthesisZhuojun Wu, Dong Liu, Juan Liu 0007, Yechen Wang, Linxi Li, Liwei Jin, Hui Bu, Pengyuan Zhang, Ming Li 0026. 12564-12570 [doi]
- A New Dataset and Benchmark for Grounding Multimodal MisinformationBingjian Yang, Danni Xu, Kaipeng Niu, Wenxuan Liu 0008, Zheng Wang 0007, Mohan Kankanhalli. 12571-12577 [doi]
- Deep-Plant-Disease Dataset Is All You Need for Plant Disease IdentificationAbel Yu Hao Chai, Kelly Li Zhen Jee, Sue Han Lee, Fei Siang Tay, Jules Vandeputte, Hervé Goëau, Pierre Bonnet, Alexis Joly. 12578-12584 [doi]
- CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal ModelsWenTao Liu, Qianjun Pan, Yi Zhang, Zhuo Liu, Ji Wu, Jie Zhou 0015, Aimin Zhou, Qin Chen 0001, Bo Jiang 0016, Liang He 0001. 12585-12591 [doi]
- A Large-scale Universal Evaluation Benchmark For Face Forgery DetectionHengrui Lou, Zunlei Feng, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie Lei 0002, Jie Song 0011, Mingli Song, Yijun Bei. 12592-12599 [doi]
- LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic ClusteringDuy-Khang Ho, Minh-Quan Ho-Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran. 12600-12606 [doi]
- WeatherBench: A Real-World Benchmark Dataset for All-in-One Adverse Weather Image RestorationQiyuan Guan, Qianfeng Yang, Xiang Chen 0015, Tianyu Song 0003, Guiyue Jin, Jiyu Jin. 12607-12613 [doi]
- MSITrack: A Challenging Benchmark for Multispectral Single Object TrackingTao Feng, Tingfa Xu, Haolin Qin, Tianhao Li, Shuaihao Han, Xuyang Zou, Zhan Lv, Jianan Li 0001. 12614-12620 [doi]
- MSC: A Marine Wildlife Dataset for Video Understanding with Grounded Segmentation and Clip-Level CaptionsQuang-Trung Truong, Yuk-Kwan Wong, Vo Hoang Kim Tuyen Dang, Rinaldi Gotama, Duc Thanh Nguyen, Sai Kit Yeung. 12621-12628 [doi]
- The CASTLE 2024 Dataset: Advancing the Art of Multimodal UnderstandingLuca Rossetto, Werner Bailer, Duc-Tien Dang-Nguyen, Graham Healy, Björn Þór Jónsson 0001, Onanong Kongmeesub, Hoang Bao Le, Stevan Rudinac, Klaus Schöffmann, Florian Spiess 0001, Allie Tran, Minh-Triet Tran, Quang-Linh Tran, Cathal Gurrin. 12629-12635 [doi]
- CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal DissectionGuankun Wang, Han Xiao 0010, Renrui Zhang, Huxin Gao, Long Bai 0008, Xiaoxiao Yang, Zhen Li 0026, Hongsheng Li 0001, Hongliang Ren 0001. 12636-12643 [doi]
- EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic GenerationCheng Zhang, Hongxia Xie, Bin Wen, Songhan Zuo, Ruoxuan Zhang, Wen-Huang Cheng. 12644-12650 [doi]
- Genesis: A Large-Scale Benchmark for Multimodal Large Language Model in Emotional Causality AnalysisYulong Li 0002, Yuxuan Zhang, Rui Chen, Feilong Tang, Zhixiang Lu, Ming Hu, Jianghao Wu 0001, Haochen Xue, Mian Zhou, Chong Li, Jionglong Su, Imran Razzak. 12651-12658 [doi]
- RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe GenerationRuoxuan Zhang, Jidong Gao, Bin Wen, Hongxia Xie, Chenming Zhang, Hong-Han Shuai, Wen-Huang Cheng. 12659-12665 [doi]
- DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal ModelsJiarui Wang, Huiyu Duan, Juntong Wang, Ziheng Jia, Woo Yi Yang, Xiaorong Zhu, Yu Zhao, Jiaying Qian, Yuke Xing, Guangtao Zhai, Xiongkuo Min. 12666-12673 [doi]
- EditWorld: Simulating World Dynamics for Instruction-Following Image EditingBohan Zeng, Ling Yang 0006, Jiaming Liu, Minghao Xu, Yuanxing Zhang, Pengfei Wan 0001, Wentao Zhang 0001, Shuicheng Yan. 12674-12681 [doi]
- 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-SplattingYuke Xing, Jiarui Wang, Peizhi Niu, Wenjie Huang, Guangtao Zhai, Yiling Xu. 12682-12689 [doi]
- GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMsXiaorong Zhu, Ziheng Jia, Jiarui Wang, Xiangyu Zhao, Haodong Duan, Xiongkuo Min, Jia Wang 0004, Zicheng Zhang, Guangtao Zhai. 12690-12697 [doi]
- VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual EvidenceChenhui Qiang, Zhaoyang Wei, Xumeng Han, Zipeng Wang, Siyao Li, Xiangyuan Lan, Jianbin Jiao, Zhenjun Han. 12698-12705 [doi]
- RoboAfford: A Dataset and Benchmark for Enhancing Object and Spatial Affordance Learning in Robot ManipulationYingbo Tang, LingFeng Zhang, Shuyi Zhang, Yinuo Zhao, Xiaoshuai Hao. 12706-12713 [doi]
- Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent RecognitionZihao Wang, Shulei Ji, Le Ma 0002, Yuhang Jin, Shun Lei, Jianyi Chen, Haoying Fu, Roger B. Dannenberg, Kejun Zhang. 12714-12721 [doi]
- SmokeBench: A Real-World Dataset for Surveillance Image Desmoking in Early-Stage Fire ScenesWenzhuo Jin, Qianfeng Yang, Xianhao Wu, Hongming Chen, Pengpeng Li, Xiang Chen. 12722-12728 [doi]
- SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology AnalysisChenghanyu Zhang, Zekun Li 0001, Peipei Li, Xing Cui, Shuhan Xia, Weixiang Yan, Yiqiao Zhang, Qianyu Zhuang. 12729-12736 [doi]
- AU-IQA: A Benchmark Dataset for Perceptual Quality Assessment of AI-Enhanced User-Generated ContentShushi Wang, Chunyi Li, Zicheng Zhang, Han Zhou 0003, Wei Dong 0010, Jun Chen 0005, Guangtao Zhai, Xiaohong Liu 0001. 12737-12744 [doi]
- Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-ThoughtShuyi Zhang, Xiaoshuai Hao, Yingbo Tang, LingFeng Zhang, Pengwei Wang 0005, Zhongyuan Wang 0006, Hongxuan Ma, Shanghang Zhang. 12745-12752 [doi]
- VIHand: Enhancing 3D Hand Pose Estimation with Visual-Inertial BenchmarkXinyi Wang, Pengfei Ren 0001, Haoyang Zhang, Xin Sheng, Da Li, Liang Xie 0012, Yue Gao, Erwei Yin. 12753-12760 [doi]
- Language-Driven 3D Human Pose Estimation in Multi-Person Scenarios: A New Dataset and ApproachTingrui Shen, Bangzhen Liu, Zhirun Fan, Shiting Zhang, Weifeng Pan, Fan Sun, Dan Cao, Shengfeng He. 12761-12768 [doi]
- PA-HOI: A Physics-Aware Human and Object Interaction DatasetRuiyan Wang, Lin Zuo, Zonghao Lin, Qiang Wang, Zhengxue Cheng, Rong Xie 0004, Jun Ling, Li Song 0001. 12769-12775 [doi]
- FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video UnderstandingXusheng He, Wei Liu, Shanshan Ma, Qian Liu, Chenghao Ma, Jianlong Wu. 12776-12783 [doi]
- Open3D-VQA: A Benchmark for Embodied Spatial Concept Reasoning with Multimodal Large Language Model in Open SpaceWeichen Zhang, Zile Zhou, Xin Zeng, Xuchen Liu, Jianjie Fang, Chen Gao 0001, Jinqiang Cui, Yong Li, Xinlei Chen, Xiao-Ping Zhang 0002. 12784-12791 [doi]
- UEMM-Air: Enable UAVs to Undertake More Multi-modal TasksLiang Yao, Fan Liu 0003, Shengxiang Xu, Chuanyi Zhang, Shimin Di, Xing Ma, Jianyu Jiang, Zequan Wang, Jun Zhou 0001. 12792-12798 [doi]
- PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics ExperimentsMinghao Zou, Qingtian Zeng, Yongping Miao, Shangkun Liu, Zilong Wang, Hantao Liu, Wei Zhou 0021. 12799-12806 [doi]
- SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language ModelsZheng Liu, Hao Liang 0017, Bozhou Li, Wentao Xiong, Chong Chen 0001, Conghui He, Wentao Zhang 0001, Bin Cui 0001. 12807-12814 [doi]
- EEG-Face: A Facial-Image Stimulated EEG Data-Set for Analysis of Brain Perceived MultimediaWuxia Zhang, Yang Xin 0004, Shibo Lv, Xin Zhang, Xiang Zhong, Jianmin Jiang. 12815-12821 [doi]
- A Dataset and Metric for Textual Video Content DescriptionStefan J. Arzberger, Paul Raith, Werner Bailer, Marion Jaks. 12822-12828 [doi]
- MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG TasksLei Zhang 0199, Xin Zhou 0008, Chaoyue He, Di Wang 0004, Yi Wu, Hong Xu 0004, Wei Liu, Chunyan Miao. 12829-12836 [doi]
- AnaFig: A Human-Aligned Dataset for Scientific Figure AnalysisTan Yue, Xuzhao Shi, Rui Mao 0010, Zilong Song, Zonghai Hu, Dongyan Zhao 0001. 12837-12843 [doi]
- Evaluating Perceptual Color Preferences in Smartphone Photography: Dataset and ChallengesZhihua Wang 0002, Weixia Zhang, Wei Zhou 0021, Xiaohong Liu 0001, Guangtao Zhai, Patrick Le Callet. 12844-12850 [doi]
- Unlocking Joint Image Deraining and Low-Light Enhancement: Benchmark and BaselineLiang Cheng, Hao Wang, Chenwei Wu, Haochen You, Xianhao Wu. 12851-12858 [doi]
- HAN: Korean Heritage Augmented Narrative Visual-Language Description DatasetSungHyun Moon, Aidyn Zhakatayev, Seungjae Lee. 12859-12866 [doi]
- MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and ReasoningZiliang Gan, Dong Zhang, Haohan Li, Yang Wu, Xueyuan Lin, Ji Liu 0003, Haipang Wu, Chaoyou Fu, Zenglin Xu, Rongjunchen Zhang, Yong Dai 0001. 12867-12874 [doi]
- MuMMy: Multimodal Dataset supporting VLM-based Egyptology Research AssistantMaksim Golyadkin, Innokentiy Humonen, Valeria Rubanova, Danil Kalin, Ianis Plevokas, Dmitry Nikolotov, Aleksandr Utkov, Nikita Sidelnikov, Petr Ivanov, Ekaterina Bureeva, Ekaterina Alexandrova, Ilya Makarov. 12875-12881 [doi]
- MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene ReconstructionBate Li, Houqiang Zhong, Zhengxue Cheng, Qiang Hu 0003, Qiang Wang, Li Song 0001, Wenjun Zhang 0001. 12882-12889 [doi]
- MORE: Multi-Organ Medical Image REconstruction DatasetShaokai Wu, Yapan Guo, Yanbiao Ji, Jing Tong, Yuxiang Lu, Mei Li, Suizhi Huang, Yue Ding 0001, Hongtao Lu 0001. 12890-12896 [doi]
- Towards High Robust Vision-Language Large Models: Benchmark and MethodMinyi Zhao, Yi Liu, Wensong He, Bingzhe Yu, Yuxi Mi, Shuigeng Zhou. 12897-12904 [doi]
- RSVLM-QA: A Benchmark Dataset for Remote Sensing Vision Language Model-based Question AnsweringXing Zi, Jinghao Xiao, Yunxiao Shi, Xian Tao, Jun Li 0010, Ali Braytee, Mukesh Prasad. 12905-12911 [doi]
- Beyond Snapshots: A Multimodal User-Level Dataset for Depression Detection in Dynamic Social Media StreamsBichen Wang, Yixin Sun, Yanyan Zhao, Bing Qin 0001. 12912-12918 [doi]
- ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving SystemsChenxi Wang, Jizhan Fang, Xiang Chen 0016, Bozhong Tian, Ziwen Xu, Huajun Chen, Ningyu Zhang 0001. 12919-12926 [doi]
- BIMCompNet: Multimodal Dataset for Geometric Deep Learning in Building Information ModelMingsong Yang, Xinhong Hei 0001, Kehai Chen, Haining Meng, Haoyang Dong, Qin Zhao. 12927-12933 [doi]
- Towards a New Paradigm of Visual Signal CompressionChunyi Li, Xiele Wu, Haoning Wu 0001, Donghui Feng 0003, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu 0001, Guangtao Zhai, Weisi Lin. 12934-12941 [doi]
- MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical ContextsHao Liang 0017, Linzhuang Sun, zhouminxuan zhouminxuan, Zirong Chen, Meiyi Qiang, Mingan Lin, Tianpeng Li, Fan Yang, Zenan Zhou, Wentao Zhang 0001. 12942-12948 [doi]
- eSkinHealth: A Multimodal Dataset for Neglected Tropical Skin DiseasesJanet Wang, Xin Hu, Yunbei Zhang, Diabate Almamy, Vagamon Bamba, Konan Amos Sébastien Koffi, Koffi Aubin Yao, Zhengming Ding, Jihun Hamm, Rie Roselyne Yotsu. 12949-12956 [doi]
- MDPE: A Multimodal Deception Dataset with Personality and Emotional CharacteristicsCong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao 0001, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu 0041, Yongwei Li. 12957-12964 [doi]
- Camouflaged Object Tracking: A BenchmarkXiaoyu Guo, Pengzhi Zhong, Hao Zhang, Defeng Huang, Huikai Shao, Qijun Zhao, Shuiwang Li. 12965-12972 [doi]
- Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training DatasetWentao Mo, Qingchao Chen, Yuxin Peng 0001, Siyuan Huang, Yang Liu 0105. 12973-12980 [doi]
- SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering BehaviorsTianlong Yu, Chenghang Ye, Zheyu Yang 0002, Ziyi Zhou 0006, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang 0001, Liting Zhou, Yang Yang 0060, Ting Bi. 12981-12987 [doi]
- SVD: Spatial Video DatasetMohammad Hossein Izadimehr, Milad Ghanbari, Guodong Chen 0004, Wei Zhou 0021, Xiaoshuai Hao, Mallesham Dasari, Christian Timmerer, Hadi Amirpour. 12988-12994 [doi]
- ICS-MR: Interactive Conversation Scenarios for Assessment of Mixed Reality CommunicationFelix Immohr, Gareth Rendle, Annika Neidhardt, Anton Benjamin Lammert, Bernd Froehlich 0001, Alexander Raake. 12995-13001 [doi]
- Beyond the Individual: Introducing Group Intention Forecasting with SHOT DatasetRuixu Zhang, Yuran Wang, Xinyi Hu, Chaoyu Mai, Wenxuan Liu 0008, Danni Xu, Xian Zhong, Zheng Wang 0007. 13002-13008 [doi]
- MetaWild: A Multimodal Dataset for Animal Re-Identification with Environmental MetadataYuzhuo Li, Di Zhao, Tingrui Qiao, Yihao Wu, Bo Pang, Yun Sing Koh. 13009-13015 [doi]
- ViewGauss: A Head Movement Dataset for 6DoF Gaussian Splatting Video ViewingZhixia Zhao, Qiyue Li 0001, Jie Li 0015, Richang Hong, Zhi Liu 0002. 13016-13022 [doi]
- UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial AgentsJianqiang Xiao, Yuexuan Sun, Yixin Shao, Boxi Gan, Rongqiang Liu, Yanjin Wu, Weili Guan, Xiang Deng 0002. 13023-13029 [doi]
- DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent KeyframesZhende Song, Chenchen Wang, Jiamu Sheng, Chi Zhang, Shengji Tang, Jiayuan Fan 0001, Tao Chen 0003. 13030-13037 [doi]
- FingerVeinSyn-5M: A Million-Scale Dataset and Benchmark for Finger Vein RecognitionYifan Wang, Jie Gui, Baosheng Yu, Qi Li, Zhenan Sun, Juho Kannala, Guoying Zhao 0001. 13038-13045 [doi]
- Referring Multi-Object Tracking in Satellite Videos: A New Benchmark and BaselinePeirong Zhang, Yidan Zhang, Hanru Shi, Dianyu Wang, Xiaoxuan Liu, Lei Wang. 13046-13052 [doi]
- Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers EstimationKonstantin Egorov, Stepan Botman, Pavel Blinov, Galina Zubkova, Anton Ivaschenko, Alexander Kolsanov, Andrey V. Savchenko. 13053-13059 [doi]
- WisWheat: A Three-Tiered Vision-Language Dataset for Wheat ManagementBowen Yuan, Selena Song, Javier Fernandez, Yadan Luo, Mahsa Baktashmotlagh, Zijian Wang 0009. 13060-13067 [doi]
- Are Synthetic Videos Useful? A Benchmark for Retrieval-Centric Evaluation of Synthetic VideosZecheng Zhao, Selena Song, Tong Chen 0005, Zhi Chen 0010, Shazia Sadiq, Yadan Luo. 13068-13074 [doi]
- Nature-1k: The Raw Beauty of Nature in 4K at 60FPSMohammad Ghasempour, Hadi Amirpour, Christian Timmerer. 13075-13081 [doi]
- RCQoEA-360VR: Real-time Continuous QoE Scores for HMD-based 360° VR DatasetSowmya Vijayakumar, Tong Xue, Abdallah El-Ali, Irene Viola 0001, Ronan Flynn, Peter Corcoran 0001, Pablo César, Niall Murray. 13082-13088 [doi]
- AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label ReannotationYulin Sun, Qisheng Xu, Yi Su 0010, Qian Zhu, Yong Dou, Xinwang Liu 0002, Kele Xu. 13089-13096 [doi]
- Valor32k-AVQA v2.0: Open-Ended Audio-Visual Question Answering Dataset and BenchmarkInes Riahi, Abduljalil Radman, Zixin Guo, Rachid Hedjam, Jorma Laaksonen. 13097-13103 [doi]
- EgoMusic: An Egocentric Augmented Reality Glasses Dataset for MusicAlessandro Ragano, Carl Timothy Tolentino, Kata Szita, Dan Barry, Davoud Shariat Panah, Niall Murray, Andrew Hines. 13104-13111 [doi]
- UVG-CWI-DQPC: Dual-Quality Point Cloud Dataset for Volumetric Video ApplicationsGuillaume Gautier, Xuemei Zhou, Thong Nguyen, Jack Jansen 0001, Louis Fréneau, Marko Viitanen, Uyen Phan, Jani Käpylä, Irene Viola 0001, Alexandre Mercat, Pablo César, Jarno Vanne. 13112-13118 [doi]
- OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event GroundingHieu Nguyen, Phuc-Tan Nguyen, Thien Phuc Tran, Minh-Quang Nguyen, Tam V. Nguyen 0002, Minh-Triet Tran, Trung-Nghia Le. 13119-13125 [doi]
- EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VRZihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu 0001. 13126-13132 [doi]
- Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?Yang Yao, Lingyu Li, Jiaxin Song, Chiyu Chen, Zhenqi He, Yixu Wang, Xin Wang 0119, Tianle Gu, Jie Li 0052, Yan Teng 0002, Yingchun Wang 0004. 13133-13140 [doi]
- GynSurg: A Comprehensive Gynecology Laparoscopic Surgery DatasetSahar Nasirihaghighi, Negin Ghamsarian, Leonie Peschek, Matteo Munari, Heinrich Husslein, Raphael Sznitman, Klaus Schoeffmann. 13141-13147 [doi]
- MISP-QEKS: A Large-Scale Dataset with Multimodal Cues for Query-by-Example Keyword SpottingShifu Xiong, Hang Chen 0001, Shi Cheng 0001, Kai Shen, Hengshun Zhou, Genshun Wan, Chenyue Zhang, Kewei Li, Jun Du 0002, Lirong Dai 0001. 13148-13155 [doi]
- UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language ModelsJinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang, Yanbin Hao. 13156-13163 [doi]
- UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban ScenariosDebora Russo, Nicola Mazzocca, Valeria Vittorini. 13164-13169 [doi]
- FRED: The Florence RGB-Event Drone DatasetGabriele Magrini, Niccolò Marini, Federico Becattini, Lorenzo Berlincioni, Niccolò Biondi, Pietro Pala, Alberto Del Bimbo. 13170-13176 [doi]
- DeHate: A Holistic Hateful Video Dataset for Explicit and Implicit Hate DetectionYuchen Zhang, Tailin Chen, Jiangbei Yue, Yueming Sun, Rahul Singh, Jianbo Jiao, Zeyu Fu. 13177-13183 [doi]
- Waymo-3DSkelMo: A Multi-Agent 3D Skeletal Motion Dataset for Pedestrian Interaction Modeling in Autonomous DrivingGuangxun Zhu, Shiyu Fan, Hang Dai, Edmond S. L. Ho. 13184-13190 [doi]
- WetCat: Enabling Automated Skill Assessment in Wet-Lab Cataract Surgery VideosNegin Ghamsarian, Raphael Sznitman, Klaus Schoeffmann, Jens Kowal. 13191-13197 [doi]
- Investigating Domain Gaps for Indoor 3D Object DetectionZijing Zhao 0004, Zhu Xu, Qingchao Chen, Yuxin Peng 0001, Yang Liu 0105. 13198-13205 [doi]
- FoodLogAthl-218: Constructing a Real-World Food Image Dataset Using Dietary Management ApplicationsMitsuki Watanabe, Sosuke Amano, Kiyoharu Aizawa, Yoko Yamakata. 13206-13212 [doi]
- GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought ReasoningBo Liu 0049, Xiangyu Zhao, Along He, Yidi Chen, Huazhu Fu, Xiao-Ming Wu 0003. 13213-13220 [doi]
- VIDEA-8K-60FPS Dataset: 8K 60FPS Video Sequences for Analysis and DevelopmentTariq Al Shoura, Ali Mollaahmadi Dehaghi, Reza Razavi, Mohammad Moshirpour. 13221-13227 [doi]
- CH-SV: A Benchmark for Multi-Type Chinese Harmful Short Video DetectionLinlin Zong, Shilin Sui, Wenjun Liang, Wanyu Song, Linlin Tian, Xinyue Liu 0002, Xianchao Zhang 0001, Bo Xu 0009. 13228-13234 [doi]
- MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World ScenariosChangtao Miao, Yi Zhang, Man Luo, Weiwei Feng, Kaiyuan Zheng, Qi Chu 0001, Tao Gong, Jianshu Li, Yunfeng Diao, Wei Zhou 0021, Joey Tianyi Zhou, Xiaoshuai Hao. 13235-13242 [doi]
- MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug InteractionsTung-Lam Ngo, Ba Hoang Tran, Duy-Cat Can, Trung Hieu Do, Oliver Y. Chén, Hoang-Quynh Le. 13243-13249 [doi]
- LUMOS: A Lumbar Multimodal Osteoporosis Screening Dataset with X-ray and CT imagesKeyue Shi, Qianqian Shen, Zhaoming Ye, Liangjun Jiang, Jiajun Bu, Haishuai Wang. 13250-13257 [doi]
- Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual ManipulationsParul Gupta, Shreya Ghosh 0001, Tom Gedeon, Thanh-Toan Do, Abhinav Dhall. 13258-13265 [doi]
- AudioAtlas: A Comprehensive and Balanced Benchmark Towards Movie-Oriented Text-to-Audio GenerationChenxi Wang, Yusheng Dai, Lei Sun 0010, Jun Du 0002, Jianqing Gao. 13266-13272 [doi]
- FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling AgentsBobo Li 0001, Yuheng Wang, Hao Fei 0001, Juncheng Li 0006, Wei Ji 0008, Mong-Li Lee, Wynne Hsu. 13273-13280 [doi]
- EmotionalCanines: A Dataset for Analysis of Arousal and Valence in Dog VocalizationTuan M. Dang, Theron S. Wang, Hridayesh Lekhak, Kenny Q. Zhu. 13281-13288 [doi]
- SVGenius: Benchmarking LLMs in SVG Understanding, Editing and GenerationSiqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang 0001, Guiyang Hou, Yongliang Shen 0001, Weiming Lu 0001, Yueting Zhuang. 13289-13296 [doi]
- Chart-HQA: A Benchmark for Hypothetical Question Answering in ChartsXiangnan Chen, Yuancheng Fang, Juncheng Li 0006, Qian Xiao, Jun Lin, Siliang Tang, Yueting Zhuang. 13297-13303 [doi]
- HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video DetectionHan Wang, Zhuoran Wang, Roy Ka-Wei Lee. 13304-13310 [doi]
- EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware EvaluationDeqiang Yin, Junyi Guo, Huanda Lu, Fangyu Wu 0001, Dongming Lu. 13311-13317 [doi]
- Challenging Cases of Neural Image Compression: A Dataset of Visually Compelling Yet Semantically Incorrect ReconstructionsNora Hofer, Rainer Böhme. 13318-13324 [doi]
- MultiRef: Controllable Image Generation with Multiple Visual ReferencesRuoxi Chen, Dongping Chen, Siyuan Wu 0001, Sinan Wang, Shiyun Lang, Peter Sushko, Gaoyang Jiang, Yao Wan 0001, Ranjay Krishna. 13325-13331 [doi]
- A Spatial Relationship Aware Dataset for RoboticsPeng Wang 0076, Minh Huy Pham, Zhihao Guo, Wei Zhou 0021. 13332-13338 [doi]
- MCOD: The First Challenging Benchmark for Multispectral Camouflaged Object DetectionYang Li, Tingfa Xu, Shuyan Bai, Peifu Liu, Jianan Li 0001. 13339-13345 [doi]
- AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video SequencesJieyu Li, Xin Zhang 0092, Joey Tianyi Zhou. 13346-13353 [doi]
- Low-light Image Enhancement Quality Assessment: A Real-World Dataset and An Objective MethodChunyi Li, Bo Hu 0008, Taiyang Chen, Leida Li, Lihuo He, Xinbo Gao 0001. 13354-13361 [doi]
- OTR: Synthesizing Overlay Text Dataset for Text RemovalJan Zdenek, Wataru Shimoda, Kota Yamaguchi. 13362-13368 [doi]
- DogSpeak: A Canine Vocalization Classification DatasetHridayesh Lekhak, Theron S. Wang, Tuan M. Dang, Kenny Q. Zhu. 13369-13375 [doi]
- HVEval: Towards Unified Evaluation of Human-Centric Video Generation and UnderstandingSijing Wu, Yunhao Li, Huiyu Duan, Yanwei Jiang, Yucheng Zhu, Guangtao Zhai. 13376-13383 [doi]
- DPCSet: A Large-scale Dynamic Point Cloud Dataset for Compression and PerceptionWenxu Gao, Liang Xie, Kangli Wang, Jingxuan Su, Changhao Peng, Wei Gao 0003. 13384-13390 [doi]
- ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional DependenciesChenglin Wang 0010, Yucheng Zhou 0001, Qianning Wang, Zhe Wang, Kai Zhang. 13391-13397 [doi]
- T23D-QA: An Open Dataset and Benchmark for Text-driven 3D Generation Quality AssessmentHaohui Li, Bowen Qu, Wei Gao 0003. 13398-13404 [doi]
- LEHA-CVQAD: Dataset To Enable Generalized Video Quality Assessment of Compression ArtifactsAleksandr Gushchin, Maksim Smirnov, Dmitriy S. Vatolin, Anastasia Antsiferova. 13405-13412 [doi]
- RGC-VQA: An Exploration Database for Robotic-Generated Video Quality AssessmentJianing Jin, Jiangyong Ying, Huiyu Duan, Liu Yang, Sijing Wu, Yunhao Li, Yushuo Zheng, Xiongkuo Min, Guangtao Zhai. 13413-13420 [doi]
- BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated VideosJiahao Lin, Weixuan Peng, Bojia Zi, Yifeng Gao 0002, Xianbiao Qi, Xingjun Ma, Yu-Gang Jiang 0001. 13421-13427 [doi]
- Screen Content Video Dataset and BenchmarkNickolay Safonov, Rakhmanov Mikhail, Dmitriy S. Vatolin. 13428-13434 [doi]
- RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-CheckingShuo Yang 0011, Yuqin Dai, Guoqing Wang, Xinran Zheng, Jinfeng Xu 0003, Jinze Li 0001, Zhenzhe Ying, Weiqiang Wang 0002, Edith C. H. Ngai. 13435-13441 [doi]
- SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMsBei Yan, Zhiyuan Chen, Yuecong Min, Jie Zhang 0071, Jiahao Wang, Xiaozhen Wang, Shiguang Shan. 13442-13449 [doi]
- Compressed Feature Quality Assessment: Dataset and BaselinesChangsheng Gao, Wei Zhou 0021, Guosheng Lin, Weisi Lin. 13450-13456 [doi]
- Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified ApproachHeng Er Metilda Chee, Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang. 13457-13463 [doi]
- StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQAYuhang Hu, Zhenyu Yang 0009, Shihan Wang 0006, Shengsheng Qian, Bin Wen, Fan Yang 0094, Tingting Gao, Changsheng Xu. 13464-13470 [doi]
- Implementation of Visualizer for Beats and ScratchesMasatoshi Hamanaka. 13471-13473 [doi]
- MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous DrivingAishan Liu, Jiakai Wang, Tianyuan Zhang 0004, Hainan Li, Jiangfan Liu, Siyuan Liang 0004, Yilong Ren, Xianglong Liu 0001, Dacheng Tao. 13474-13476 [doi]
- 'Hi AirStar, Guide Me to the Badminton Court.'Ziqin Wang, Jinyu Chen, Xiangyi Zheng, Qinan Liao, Linjiang Huang, Si Liu 0001. 13477-13479 [doi]
- SLIVeR: A Narrative VR Experience for Immersive Lifelog ExplorationLiang Xu, Songkai Jia, Cathal Gurrin, Allie Tran. 13480-13482 [doi]
- An Aesthetic Cultural Relic Poster Generation Framework Based on Multi-target Learning and Multimodal Large Language ModelMohan Zhang, QianQian Hu, Chuhan Li, Yanxiu Dan, Shenglan Cui, Fang Liu 0002. 13483-13485 [doi]
- KDTalker++: Controllable Talking Portrait Generation with Audio, Text, and Expression EditingChaolong Yang, Yinuo Guo, Kai Yao, Yuyao Yan, Jie Sun 0024, Kaizhu Huang. 13486-13488 [doi]
- Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language ModelsWei Cai, Jian Zhao 0013, Yuchu Jiang, Tianle Zhang, Xuelong Li 0001. 13489-13491 [doi]
- Omni-LLaMA-AD: A Unified Model for Open-Set Visual Anomaly DetectionRongyu Zhang, Zhanbin Hu, Jiamu Wang, Qiang Zhu. 13492-13494 [doi]
- PrivEdit: A Zero-Shot Interactive Image Privacy Editing SystemXiao Chen, Wenrui He, Meng Wang, Zhanbin Hu, Chaoquan Shen, Qiang Zhu. 13495-13497 [doi]
- Talk, Imagine, Evolve: A Unified Multimodal Agent for Seamless Visual Generation and EditingZhaofan Qiu, Zijian Gong, Yingwei Pan, Ting Yao 0003, Tao Mei 0001. 13498-13500 [doi]
- HL-EAI: A Multimodal Framework Enabling Emotional Reciprocity in Human-AI Strategic Decision-MakingMikhail Mozikov, Daniil Orekhov, Ivan Nasonov, Konstantin Baltsat, Vladislav Pedashenko, Dmitrii Abramov, Nikita Severin, Yury Maximov, Andrey Savchenko, Ilya Makarov. 13501-13503 [doi]
- Depth-Enabled Inspection of Medical VideosHadi Amirpour, Doris Putzgruber-Adamitsch, Klaus Schoeffmann. 13504-13506 [doi]
- Edit-by-Example: Adaptive Exemplar-Based Image EditingYaojie Li, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Wu Liu 0005, Ting Yao 0003, Tao Mei 0001. 13507-13509 [doi]
- Streamlining Virtual KOL Generation Through Modular Generative AI ArchitectureTan-Hiep To, Duy Khang Nguyen, Minh-Triet Tran, Trung-Nghia Le. 13510-13512 [doi]
- CounterHelp: Promoting Online Civil Courage Among Young People Through AI-Generated CounterspeechAndreas Babic, Xihui Chen, Djordje Slijepcevic, Adrian Jaques Böck, Matthias Zeppelzauer. 13513-13515 [doi]
- MAXplain: A Multi-Agent System for Interactive Multimodal Hate Speech DetectionNils Riekers, Marten Risius, Tong Chen 0005. 13516-13518 [doi]
- Real-Time SSL Sperm Whale Click Detector: Interactive Web DemoAnvar Iskhakov, Viktor Kovalev, Vladislav Naumov, Ilya Makarov. 13519-13521 [doi]
- EEG-Driven Image Reconstruction with Saliency-Guided Diffusion ModelsIgor Abramov. 13522-13524 [doi]
- FCM-RT: Real-Time Feature Coding for MachinesAshan Perera, Md Eimran Hossain Eimon, Juan Merlos, Velibor Adzic, Hari Kalva, Borko Furht. 13525-13527 [doi]
- CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis SystemMichael Francis Perez, Yichi Yang, Yuheng Zha, Enze Ma, Danish Nisar Ahmed Tamboli, Haodi Ma, Reza Shahriari, Vyom Pathak, Dzmitry Kasinets, Rohith Venkatakrishnan, Daisy Zhe Wang, Jaime Ruiz, Eric D. Ragan, Zhiting Hu, Eric P. Xing, Jun-Yan Zhu. 13528-13530 [doi]
- Permission to Dance: An End-to-End Dance Enhancement System from Dance Capture to AnalysisJungsu Kim, Jungwoo Huh, Yeseung Park, Seongjean Kim, JeongWook Choi, Sanghoon Lee 0001. 13531-13533 [doi]
- Advancing Fashion Design Through Intelligent Sketchpad StudioNhu-Binh Nguyen Truc, Nhu-Vinh Hoang, Tam V. Nguyen 0002, Minh-Triet Tran, Trung-Nghia Le. 13534-13536 [doi]
- Anywhere Avatar: 3D Telepresence with Just a Phone and a LaptopRuifan Ji, Mingyuan Wu, Bo Chen 0025, Michael Zink, Ramesh K. Sitaraman, Jacob Chakareski, Klara Nahrstedt. 13537-13539 [doi]
- GenWardrobe: A Fully Generative System for Travel Fashion Wardrobe ConstructionPeng Jin, Yilin Wen 0009, Mingzhe Yu, Yunshan Ma, Rong Zheng, Jin-Tu Fan, Chong-Wah Ngo. 13540-13542 [doi]
- SDART: Spatial Dart AR Simulation with Hand-Tracked InputMilad Ghanbari, Wei Zhou 0021, Cosmin Stejerean, Christian Timmerer, Hadi Amirpour. 13543-13545 [doi]
- FaceCluster: Interactive Photo Organization with Enhanced Face RecognitionAlexander Filonenko, Ilya Makarov, Andrey V. Savchenko. 13546-13548 [doi]
- MindSpeak: A Real-Time BCI System for Silent SpeechJinzhao Zhou, Daniel Leong, Zehong Cao, Thomas Do, Sheng-Fu Liang, Tzyy-Ping Jung, Chin-Teng Lin. 13549-13551 [doi]
- Pask: Providing Answer before AsKing toward Proactive AI agentZhifei Xie, Hu Zongzheng, Guibin Zhang, Jialin Zhang, Yue Liao, Chunyan Miao, Shuicheng Yan. 13552-13554 [doi]
- Event Chain-Driven Communication Strategy Generation for News VideosQinglan Wei, Ruiqi Xue, Mingyue Liao, Long Ye. 13555-13557 [doi]
- 'What Can I Cook?' LetMeCook: An LLM-Based Interactive System for Personalized Recipe GenerationShiqin Liu, Minjun Zhao, Jiajun Bu. 13558-13560 [doi]
- Streaming 3DGS Virtual Worlds in 6DoF over Next-Generation NetworksYuan-Chun Sun. 13561-13565 [doi]
- Enhancing Sports Experiences Through Video-Based InteractionsJoão Diogo. 13566-13570 [doi]
- Multi-Modal Retrieval Augmented Visual Understanding and GenerationZhucun Xue. 13571-13575 [doi]
- RoboSax Melody Slot MachineMasatoshi Hamanaka, Gou Koutaki. 13576-13578 [doi]
- 2O Painter: An Artistic Oriented Realtime Realistic Oil Painting Agent Powered by Efficient Fluid SimulationJinfan Liu, Zhangli Hu, Hanqi Chen, Ye Chen 0006, Bingbing Ni, Shuicheng Yan. 13579-13581 [doi]
- WYSIWYG: What You See Is Where Your GazeRaphaëlle Lemaire, Azamat Kaibaldiyev, Eléonore Mariette, Débora Viglieri, Alexis Lechervy, Fabrice Maurel, Gaël Dias, Jérémie Pantin, Gaëtane Blaizot, Véronique Agin, Nicolas Poirel, Eric Bui, Hervé Platel, Denis Vivien, Youssef Chahir. 13582-13584 [doi]
- MirageXuanyang Huang, Wei Huang. 13585-13586 [doi]
- Through The Mirage, Sky Meets Oculus: Rethinking Human-AI Romantic Relationships in a Posthumanist ContextMingdong Song, Yufei Huang. 13587-13588 [doi]
- Echoes of the Creator: An Immersive VR System for Spatial Storytelling and Empathy Towards Co-CreationYifan Chen. 13589-13591 [doi]
- Mixanthropy: Holographic Metamorphic CloudsMeichun Cai, Yiou Wang. 13592-13593 [doi]
- So Long: Interactive Storytelling, Embodying Collective Historical Memory, and Participatory Archiving in a VR VoyageTianxing Zhou, Chengkai Xu, Xinyue Yao. 13594-13596 [doi]
- Rhythm Gate: Invisible Conversations in the Elevator - Echoes of Material, Behavior, Memory and TransformationXia Liu, Xiao Zhang. 13597-13599 [doi]
- Transform your Smartphone in a Real-time Sonagram PlayerJean-Denis Durou, Jean Mélou, Yvain Quéau, Gilles Azzaro, Hugo Pauget Ballesteros, Gabriel Gournay, Achille Jeanvoine, Clément Lacire, Floriane Payen, Julie Remenant. 13600-13602 [doi]
- Reconstructing the Experience of Nüshu Culture: An Exploration via Multimodal Mixed Reality SystemsZheyu Feng, Boya Liu, Zhonghe Ruan, Xinyi Zhang, Zihan Gao. 13603-13605 [doi]
- Embodied Ink: A Multisensory Reinterpretation of Chinese Calligraphy Through Digital Twins and Immersive RealitiesAnna Borou Yu, Jiajian Min. 13606-13607 [doi]
- Open-CD: A Comprehensive Toolbox for Change DetectionKaiyu Li, Jiawei Jiang, Chengxi Han, Yupeng Deng 0001, Keyan Chen 0001, Zhuo Zheng, Hao Chen 0045, Ziyuan Liu, Yuantao Gu, Zhengxia Zou, Zhenwei Shi 0001, Sheng Fang 0001, Deyu Meng, Zhi Wang 0002, Xiangyong Cao. 13608-13612 [doi]
- OpenMVC: An Open-Source Library for Learning-based Multi-view CompressionHuiming Zheng, Wei Gao 0003. 13613-13617 [doi]
- SCID-Compress900: A Multi-Scene Dataset of 4K and 1080P Screen Content Images for Image Compression ResearchHuiming Zheng, Linjie Zhou, Wei Gao 0003. 13618-13622 [doi]
- adder-viz: Real-Time Visualization Software for Transcoding Event VideoAndrew C. Freeman, Luke Reinkensmeyer. 13623-13627 [doi]
- Tyee: A Unified, Modular, and Fully-Integrated Configurable Toolkit for Intelligent Physiological Health CareTao Zhou, Lingyu Shu, Zixing Zhang, Jing Han. 13628-13631 [doi]
- AudioFab: Building A General and Intelligent Audio Factory through Tool LearningCheng Zhu, Jing Han 0010, Qianshuai Xue, Kehan Wang, Huan Zhao 0003, Zixing Zhang 0001. 13632-13635 [doi]
- HiDream-I1: An Open-Source High-Efficient Image Generative Foundation ModelQi Cai, Yehao Li, Yingwei Pan, Ting Yao 0003, Tao Mei 0001. 13636-13639 [doi]
- OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token PredictionPablo Alonso-Jiménez, Pedro Ramoneda, Recep Oguz Araz, Andrea Poltronieri, Dmitry Bogdanov. 13640-13643 [doi]
- MeGraS: An Open-Source Store for Multimodal Knowledge GraphsLuca Rossetto, Florian Ruosch. 13644-13647 [doi]
- Video Lecture Analysis Toolkit: An Open-Source Framework for Interactive LearningTravis Seng, Axel Carlier, Wei Tsang Ooi. 13648-13651 [doi]
- Open-Source Multimedia Retrieval with vitrivr-engineRalph Gasser, Rahel Arnold, Laura Rettig, Heiko Schuldt, Raphael Waltenspül, Luca Rossetto. 13652-13655 [doi]
- PySimPace v2.0: An Easy-to-Use Simulation Tool with Machine Learning Pipelines for Realistic MRI Motion Artifact GenerationSnehil Kumar, Neil Vaughan, Zeyu Fu, Heather Wilson. 13656-13659 [doi]
- OpenAPV: Open Collaborative Innovation in Professional Video EcosystemMinsoo Park, Youngkwon Lim, Yangwoo Kim, Sam Richards, Min Woo Park, Kwang-Pyo Choi. 13660-13663 [doi]
- diveXplore - An Open-Source Software for Modern Video Retrieval with Image/Text EmbeddingsMario Leopold, Farzad Tashtarian, Klaus Schoeffmann. 13664-13668 [doi]
- Reproducibility Companion Paper: Maskable Retentive Network for Video Moment RetrievalJingjing Hu, Dan Guo 0001, Meng Wang 0001, Jiaxi Li, Fei Liu. 13669-13672 [doi]
- Reproducibility Companion Paper: NIF: A Fast Implicit Image Compression with Bottleneck Layers and Modulated Sinusoidal ActivationsLorenzo Catania, Dario Allegra, Luigi Capogrosso, Thu Nguyen. 13673-13676 [doi]
- Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light SpecksHamed Alimohammadzadeh, Shahram Ghandeharizadeh, Federico Cunico, Joshua Springer. 13677-13681 [doi]
- Reproducibility Companion Paper: Enhancing Model Interpretability with Local Attribution over Global ExplorationZhiyu Zhu, Zhibo Jin, Jiayu Zhang 0001, Fang Chen 0001, Jianlong Zhou, Vijay John, Florian Spiess 0001. 13682-13685 [doi]
- AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World PerturbationsZhixi Cai, Kartik Kuckreja, Shreya Ghosh 0001, Akanksha Chuchra, Muhammad Haris Khan, Usman Tariq, Tom Gedeon, Abhinav Dhall. 13686-13691 [doi]
- HOLA: Enhancing Audio-visual Deepfake Detection via Hierarchical Contextual Aggregations and Efficient Pre-trainingXuecheng Wu, Heli Sun, Danlei Huang, Xinyi Yin, Yifan Wang, Hao Wang 0182, Jia Zhang 0016, Fei Wang 0128, Peihao Guo, Suyu Xing, Junxiao Xue, Liang He 0006. 13692-13699 [doi]
- Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine-Grained LocalizationNicholas Klein, Hemlata Tak, James Fullwood, Krishna Regmi, Leonidas Spinoulas, Ganesh Sivaraman, Tianxiang Chen, Elie Khoury 0001. 13700-13706 [doi]
- KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual FeaturesIvan Kukanov, Jun Wah Ng. 13707-13713 [doi]
- Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media (AADD-2025)Sebastiano Battiato, Mirko Casu, Francesco Guarnera, Luca Guarnera, Giovanni Puglisi, Orazio Pontorno, Claudio Vittorio Ragaglia, Zahid Akhtar. 13714-13719 [doi]
- Team RoMa @ AADD-2025: On the Generation of Transferable and Visually Imperceptible Adversarial Attacks Against Deepfake DetectorsNicolas Göller, Lukas Graner, Raphael Antonius Frick, Niklas Bunzel. 13720-13724 [doi]
- A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability EnhancementGaozheng Pei, Ke Ma 0001, Dongpeng Zhang, Chengzhi Sun, Qianqian Xu 0001, Qingming Huang. 13725-13729 [doi]
- MIG-COW: Transferable Adversarial Attacks on Deepfake Detectors via Gradient DecompositionWonJune Seo, Joonhyuk Baek, Yeseong Jung, Saerom Park. 13730-13736 [doi]
- Identity-Preserving Video Generation ChallengeYiheng Zhang, Zhaofan Qiu, Qi Cai, Yehao Li, Fuchen Long, Yingwei Pan, Ting Yao 0003, Tao Mei 0001. 13737-13742 [doi]
- Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled RepresentationsYuji Wang, Moran Li, Xiaobin Hu, Ran Yi 0002, Jiangning Zhang, Han Feng, Weijian Cao, Yabiao Wang, Chengjie Wang, Lizhuang Ma. 13743-13750 [doi]
- Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance EnhancementJiayi Gao, Changcheng Hua, Qingchao Chen, Yuxin Peng 0001, Yang Liu 0105. 13751-13757 [doi]
- Improving Identity Preservation in Video Generation with Multi-Branch ModelsJiahao Xu, Jianjie Luo, Zhenguo Yang. 13758-13765 [doi]
- ACM Multimedia 2025 Grand Challenge report for Image-to-Video Generation Model AccelerationJie Yang 0073, Shien Song, Jin Chen, Haoyuan Xie, Han Qi, Yifei Xue, Yizhen Lao. 13766-13770 [doi]
- LAVA Grand Challenge 2025: Benchmarking Japanese-English Document Understanding with Large Vision-Language ModelsDaichi Sato, Duc Minh Vo, Khan Md. Anwarus Salam, Hidenori Shoji, Yuma Matsuoka, Takara Taniguchi, Kaito Baba, Hideki Nakayama. 13771-13776 [doi]
- AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource SettingsHaoxuan Li, Wei Song, Aofan Liu, Peiwu Qin. 13777-13783 [doi]
- Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question AnsweringAo Zhou, Zebo Gu, Tenghao Sun, Jiawen Chen, MingSheng Tu, Zifeng Cheng, Yafeng Yin 0002, Zhiwei Jiang 0001, Qing Gu 0001. 13784-13790 [doi]
- Two-Stage Approach Using Pretrained Language Models for Question Answering on Japanese Document ImagesMizuki Yamano, Keito Fukuoka, Hisashi Miyamori. 13791-13796 [doi]
- MIRAGE25: ACM MM25 Multimodal Interleaved Reasoning and Generation ChallengeDong Chen 0017, Fei Gao, Zhengqing Hu, Xiaojun Chang. 13797-13798 [doi]
- Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language ReasoningAngelos Vlachos, Giorgos Filandrianos, Maria Lymperaiou, Nikolaos Spanos, Ilias Mitsouras, Vasileios Karampinis, Athanasios Voulodimos. 13799-13805 [doi]
- LVLM-MIR: Large Vision-Language Model with Parameter-Efficient Fine-Tuning for Multimodal Interleaved ReasoningJun Yu 0001, Xilong Lu, Cong Wang 0039, Qiang Ling 0001. 13806-13812 [doi]
- IntentVC 2025: The ACM Multimedia Grand Challenge on Intention-Oriented Controllable Video CaptioningTakahiro Komamizu, Marc A. Kastner 0001, Yasutomo Kawanishi, Trung Thanh Nguyen, Junan Chen. 13813-13814 [doi]
- MGVC: MLLM-Guided Video Captioning for the IntentVC ChallengeZhiPeng Yu, Qianqian Xu 0001, Yangbangyan Jiang, Pinci Yang, Qingming Huang. 13815-13821 [doi]
- IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video CaptioningTianheng Qiu, Jingchun Gao, Jingyu Li, Huiyi Leong, Xuan Huang, Xi Wang, Xiaocheng Zhang, Kele Xu, Lan Zhang. 13822-13829 [doi]
- CMA-VC: Large Vision-Language Model for Cross-Modal Alignment in Intention-Oriented Video CaptioningJun Yu 0001, Xilong Lu, Yunxiang Zhang, Qiang Ling 0001. 13830-13836 [doi]
- MER 2025: When Affective Computing Meets Large Language ModelsZheng Lian 0004, Rui Liu 0008, Kele Xu, Bin Liu 0041, Xuefei Liu, Yazhou Zhang 0001, Xin Liu 0012, Yong Li 0032, Zebang Cheng, Haolin Zuo, Ziyang Ma 0001, Xiaojiang Peng, Xie Chen 0001, Ya Li 0001, Erik Cambria, Guoying Zhao 0001, Björn W. Schuller, Jianhua Tao 0001. 13837-13842 [doi]
- Personality Prediction via Multimodal Fusion with Sentiment Analysis EnhancementXuerui Cheng, Feng Chen 0044, Jun Xie 0003, Kanokphan Lertniphonphan, Yi Liu, Zhepeng Wang 0002. 13843-13847 [doi]
- Affective-CoT: Decomposing Multimodal Emotion Reasoning through a Hierarchical Cognitive WorkflowYuesheng Huang, Jinming Liu, Jiajia Chen, Yihang Lin, Yanmei Chen, Jianwei Dong. 13848-13855 [doi]
- UniEmotion: A Unified Framework for Multimodal Emotion Recognition with Iterative Consensus-based TrainingYanjie Sun, Wuyang Chen 0002, Yong Dou. 13856-13863 [doi]
- Agent-MER: A Cognitive Agent with Hierarchical Deliberation for Open-Vocabulary Multimodal Emotion RecognitionZhengqin Lai, Zhilin Zhu 0001, Xiaopeng Hong, Yaowei Wang 0001. 13864-13871 [doi]
- The ACM Multimedia 2025 Grand Challenge of Truthful and Responsible Multimodal LearningXudong Han, Kai Liu, Yanlin Li, Hao Li 0093, Zheng Wang 0007. 13872-13873 [doi]
- DeepSIX at ACM MM 2025 Grand Challenge: Enhancing Context Text Processing for Multimodal Hallucination Detection and Fact VerificationHoang Chu, Huy Chu, Tan Minh Nguyen, Son T. Luu, Cuong Hoang, Hiep Nguyen, Vu Tran, Le-Minh Nguyen 0001. 13874-13880 [doi]
- HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMsZijian Zhang, Xuecheng Wu, Danlei Huang, Siyu Yan, Chong Peng, Xuezhi Cao. 13881-13887 [doi]
- Unified Dual-Strategy Framework for Multi-Task Visual Question AnsweringShuoping Yang, Jun Yu 0001. 13888-13894 [doi]
- Assessing Personality Traits and Interview Performance from Asynchronous Video InterviewsTianyi Zhang, Tianhua Qi, Antonis Koutsoumpis, Yuan Zong, Wenming Zheng, Janneke K. Oostrom, Djurre Holtrop, Zhaojie Luo, Reinout E. de Vries. 13895-13900 [doi]
- Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent BehaviorsJia Li 0013, Yichao He, Jiacheng Xu, Tianhao Luo, Zhenzhen Hu 0004, Richang Hong, Meng Wang 0001. 13901-13908 [doi]
- Listening to the Unspoken: Exploring '365' Aspects of Multimodal Interview Performance AssessmentJia Li 0013, Yang Wang 0023, Wenhao Qian, Jialong Hu, Zhenzhen Hu 0004, Richang Hong, Meng Wang 0001. 13909-13916 [doi]
- Enhancing Multimodal Personality Assessment with LLM-Augmented Hierarchical FusionLongjiang Yang, Cong Yu, Chenxi Huang, Fengyu Zhang, Ran Liu, Zhuofan Wen 0001, Shun Chen, Hailiang Yao, Bin Liu 0041, Zheng Lian 0004, Jianhua Tao 0001. 13917-13923 [doi]
- The First MPDD Challenge: Multimodal Personality-aware Depression DetectionChangzeng Fu, Zelin Fu, Qi Zhang, Xinhe Kuang, Jiacheng Dong, Kaifeng Su, Yikai Su, Wenbo Shi, Junfeng Yao, Yuliang Zhao, Shiqi Zhao, Jiadong Wang, Siyang Song, Chaoran Liu, Yuichiro Yoshikawa, Björn W. Schuller, Hiroshi Ishiguro. 13924-13929 [doi]
- DepFormer: A Unified Framework with Bimodal Collaborative Transformer for Depression DetectionFangyuan Liu, Sirui Zhao, Kang Yin, Tong Xu 0001, Enhong Chen. 13930-13936 [doi]
- HOPE: Hierarchical Fusion for Optimized and Personality-Aware Estimation of DepressionHanlei Shi, Yu Liu, Haoxun Li, Yuxuan Ding, Jiaxi Hu, Leyuan Qu, Taihao Li. 13937-13943 [doi]
- Multi-Level Segment Fusion Based on Adaptive Time-Window Selection for Multimodal Personality-Aware Elderly Depression DetectionYuyun Liu, Kaifei Zhang, Yinghao Ma, Xiaolin Xu, Tianhua Qi, Wenming Zheng, Cheng Lu 0005, Yuan Zong. 13944-13950 [doi]
- MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question AnsweringXinqi Fan, Jingting Li 0001, John See, Moi Hoon Yap, Wen-Huang Cheng, Xiaobai Li, Xiaopeng Hong, Su-Jing Wang, Adrian K. Davison. 13951-13956 [doi]
- HierMEQA: A Relationship-Aware Hierarchical Framework for Consistent Micro-Expression Visual Question AnsweringLingsi Zhu, Yanjun Chi, Jun Yu 0001, Gongpeng Zhao, Yuefeng Zou, Fengzhao Sun, Xilong Lu. 13957-13963 [doi]
- Boosting Micro-Expression Analysis via Prior-Guided Video-Level RegressionZizheng Guo 0002, Bochao Zou, Yinuo Jia, Xiangyu Li, Huimin Ma 0001. 13964-13971 [doi]
- Emotion-Qwen-VL: A Fully Fine-Tuned Multimodal Large Language Model for Micro-Expression Visual Question AnsweringYujing Wang, Ruotong Fang, Xing Huang, Zhiyuan Han, Xiaoqing Lin, Yuhao Shan, Tong Chen 0008. 13972-13978 [doi]
- REACT 2025: the Third Multiple Appropriate Facial Reaction Generation ChallengeSiyang Song, Micol Spitale, Xiangyu Kong 0001, Hengde Zhu, Cheng Luo, Cristina Palmero, Germán Barquero, Sergio Escalera, Michel F. Valstar, Mohamed Daoudi, Tobias Baur 0001, Fabien Ringeval, Andrew Howes 0001, Elisabeth André, Hatice Gunes. 13979-13984 [doi]
- Scattering-Conditioned Diffusion Models for Multiple Appropriate Facial Reaction GenerationQirong Mao, Qiwei Wu, Na Liu, Yakui Ding, Lijian Gao. 13985-13991 [doi]
- Multiple Appropriate Facial Reaction Generation Based on Multi-View Transformation of Speaker VideoJiajian Huang, Zitong Yu. 13992-13996 [doi]
- Explaining Listener Reactions: Personality-Guided Facial Response Generation with Cross-Modal AttentionPeng Wang, Pujun Xue, Xiaofeng Liu, Tongjuan Ji. 13997-14003 [doi]
- The ACM Multimedia 2025 Grand Challenge of Avatar-based Multimodal Empathetic ConversationHan Zhang, Hao Fei 0001, Hong Han 0001, Lizi Liao, Erik Cambria, Min Zhang 0005. 14004-14005 [doi]
- E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language ModelRonghao Lin, Shuai Shen, Weipeng Hu, Qiaolin He, Aolin Xiong, Li Huang, Haifeng Hu 0001, Yap-Peng Tan. 14006-14013 [doi]
- EMO-Avatar: An LLM-Agent-Orchestrated Framework for Multimodal Emotional Support in Human AnimationKeqi Chen, Wenxin Fu, Qihang Lu, Zekai Sun, Yizhong Geng, Yi Liu, Puyuan Guo, Yingming Gao, Ya Li 0001. 14014-14020 [doi]
- MERIA: Empathetic Response Generation via Parallel Disentanglement and Uncertainty-Gated FusionChenhao Dang, Zeyuan Zhu. 14021-14027 [doi]
- The 2025 Grand Challenge on Multimedia Verification: Foundations and OverviewDuc-Tien Dang-Nguyen, Morten Dahlback Langfeldt, Henrik Brattli Vold, Silje Førsund, Minh-Son Dao, Sohail Ahmed Khan, Kha-Luan Pham, Marc Gallofré Ocaña, Minh-Triet Tran, Anh Duy Tran. 14028-14033 [doi]
- Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language ModelsHuy Hoan Le, Van Sy Thinh Nguyen, Thi Le Chi Dang, Vo Thanh Khang Nguyen, Truong Thanh Hung Nguyen, Hung Cao. 14034-14040 [doi]
- Ægis: AI-Enhanced OSINT for Multimedia VerificationMinh-Anh Pham, Anh-Tai Pham-Nguyen, Anh Duy Le, Duc-Tuan Luu, Thanh-Hai Tran, Anh Duy Tran, Duc-Tien Dang-Nguyen. 14041-14047 [doi]
- Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online MediaVan-Hoang Phan, Tung-Duong Le-Duc, Long-Khanh Pham, Anh-Thu Le, Quynh-Huong Dinh-Nguyen, Dang-Quan Vo, Hoang-Quoc Nguyen-Son, Anh Duy Tran, Dang-Vu, Minh-Son Dao. 14048-14054 [doi]
- SMPV: Social Media Prediction for VideosBo Wu 0018, Peiye Liu, Qiushi Huang, Zhaoyang Zeng, Jia Wang 0020, Bei Liu 0001, Jiebo Luo 0001, Wen-Huang Cheng. 14055-14057 [doi]
- Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity PredictionAo Zhou, MingSheng Tu, LuPing Wang, Tenghao Sun, Zifeng Cheng, Yafeng Yin 0002, Zhiwei Jiang 0001, Qing Gu 0001. 14058-14065 [doi]
- FAME: Fusion-Aware Multi-modal Ensemble for Social Media Popularity PredictionYan Zhuang, Wei Bai, Yanru Zhang, Minhao Liu, Jiawen Deng, Fuji Ren. 14066-14072 [doi]
- Modality-Aligned Hierarchical Attention Network for Multi-Modal Popularity Prediction on Social MediaWenzheng Hou, Weixin Li 0001. 14073-14078 [doi]
- MVP: Winning Solution to SMP Challenge 2025 Video TrackLiliang Ye, Yunyao Zhang, Yafeng Wu, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang 0034, Zikai Song. 14079-14085 [doi]
- Higher-Order Vision-Language Fusion for Video Popularity PredictionKele Xu, Qisheng Xu, Binli Luo, Han Zhou 0003, Zengming Lin, Hui Geng, Xianhan Tan. 14086-14093 [doi]
- Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and ExpansionChia-Ming Lee, Bo-Cheng Qiu, Cheng-Jun Kang, Yi Hsuan Wu, Jun-Lin Chen, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Chung Hsu. 14094-14100 [doi]
- The ACM Multimedia 2025 Grand Challenge of Multimodal Conversational Aspect-based Sentiment AnalysisMeng Luo 0010, Hao Fei 0001, Bobo Li 0001, Shengqiong Wu, Qian Liu 0012, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu. 14101-14106 [doi]
- Structured Prompting and LLM Ensembling for Multimodal Conversational Aspect-based Sentiment AnalysisZhiqiang Gao, Shihao Gao, Zixing Zhang, Yihao Guo, Hongyu Chen, Jing Han. 14107-14113 [doi]
- SDG-MLLM: Injecting Structured Dialogue Graphs into MLLM for Multimodal Conversational Aspect-Based Sentiment AnalysisXinjing Liu, Pengyue Lin, Xinyu Tu, Wenqi Jia 0006, Chen Jiang, Ruifan Li. 14114-14121 [doi]
- A Two-Stage Full Fine-Tuning and LLM Post-processing Framework for MCABSADeyuan Chen, Xiaocui Yang, Shi Feng 0001, Zihan Cheng, Daling Wang, Yifei Zhang 0003. 14122-14129 [doi]
- ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot ConversationsShiye Cao, Maia Stiber, Amama Mahmood, Maria Teresa Parreira, Wendy Ju, Micol Spitale, Hatice Gunes, Chien-Ming Huang 0001. 14130-14135 [doi]
- Beyond Technical Failures: Multimodal Time-Series Modelling for Detecting Social Breakdowns and User Repair Attempts in Human-Robot InteractionRutherford Agbeshi Patamia, Ha Pham Thien Dinh, Ming Liu 0028, Akansel Cosgun. 14136-14142 [doi]
- Multimodal Time Series Alignment for Error Detection in Human Robot InteractionsXun Jiang 0001, Shuangle Li, Chong Liu, Xing Xu 0001. 14143-14149 [doi]
- MultiMediate '25: Cross-cultural Multi-domain Engagement EstimationDaksitha Senel Withanage Don, Marius Funk, Michal Balazia, Huajian Qiu, Shogo Okada, François Brémond, Jan Alexandersson, Andreas Bulling, Elisabeth André, Philipp Müller 0001. 14150-14155 [doi]
- LVLM-HBA: Large Vision-Language Model with Cross-Modal Alignment for Human Behavior AnalysisJun Yu 0001, Xilong Lu, Lingsi Zhu, Qiang Ling 0001. 14156-14162 [doi]
- Heterogeneous Encoder Fusion with KAN Decoder for Group Engagement Modeling via 8× Sliding PipelinesYuefeng Zou, Hui Zhang, Jun Yu 0001, Keda Lu, Linsi Zhu, Fengzhao Sun, Bo Wang, Kun Yao, Jianqing Sun, Jiaen Liang. 14163-14169 [doi]
- Generalizable Engagement Estimation in Conversation via Domain Prompting and Parallel AttentionYangchen Yu, Yin Chen, Jia Li 0013, Peng Jia, Yu Zhang 0082, Li Dai, Zhenzhen Hu 0004, Meng Wang 0001, Richang Hong. 14170-14177 [doi]
- ACM Multimedia Grand Challenge on ENT Endoscopy AnalysisTrong Thuan Nguyen, Viet-Tham Huynh, Thao Thi Phuong Dao, Mai-Khiem Tran, Ha Nguyen Thi, Tien To Vu Thuy, Uyen Hanh Tran, Tam V. Nguyen 0002, Minh-Triet Tran, Thanh Dinh Le. 14178-14183 [doi]
- HyMoENet: Mixture-of-Experts Enhanced CNN-Transformer Hybrid Framework for Classifying Anatomical Sites in Endoscopic ENT ImagesTrong Nhan Nguyen, Luan L. M. Nguyen, Phat-Dat To, Tran-Quoc Duy Nguyen, Anh Huy Nguyen, Tuan Pham-Dang, Chu Lam Nguyen, Duy V. M. Nguyen. 14184-14189 [doi]
- Enhancing Endoscopic Image Retrieval via Self-Supervised Learning and Large VLM-Based Re-rankingKhoa Tran, Linh Ly, Duy Khanh Ho, Ngoc Hoang Luong. 14190-14196 [doi]
- Multi-Level CLS Token Fusion for Contrastive Learning in Endoscopy Image ClassificationY. Hop Nguyen, Doan Anh Phan Huu, Trung Thai Tran, Nhat Nam Mai, Van Toi Giap, Thao Thi Phuong Dao, Trung-Nghia Le. 14197-14203 [doi]
- GroMo25: ACM Multimedia 2025 Grand Challenge for Plant Growth Modeling with Multiview ImagesShreya Bansal, Ruchi Bhatt, Amanpreet Chander, Rupinder Kaur, Malya Singh, Mohan Kankanhalli, Abdulmotaleb El-Saddik, Mukesh Saini. 14204-14209 [doi]
- ViewSparsifier: Killing Redundancy in Multi-View Plant PhenotypingRobin-Nico Kampa, Fabian Deuser, Konrad Habel, Norbert Oswald. 14210-14215 [doi]
- MAC 2025: The 2nd Micro-Action Analysis Grand ChallengeKun Li 0008, Dan Guo 0001, Xiaobai Li, Haoyu Chen 0001, Pengyu Liu 0005, Fei Wang 0073, Jingjing Hu, Guoying Zhao 0001, Meng Wang 0001. 14216-14221 [doi]
- Progressive Large-Scale Modeling via Temporal-Spatial Focus Connector for Micro-Action RecognitionQiankun Li, Qiupu Chen, Huabao Chen, Feng He, Depeng Li 0001, Zhigang Zeng. 14222-14228 [doi]
- Combatting Data Imbalance and Noise in Micro-Action RecognitionChuang Wang, Weidong Chen 0010, Xu Cui, Yiming Zhao, Zhaobo Qi, Pengqi Huang, Xinyan Liu, Weigang Zhang. 14229-14235 [doi]
- Hierarchical Multi-Feature Extraction and Aggregation for Micro-Action RecognitionZhichao Xia, Yichi Zhang, Yanjun Chi, Lingsi Zhu, Mohan Jing, Jun Yu 0001. 14236-14243 [doi]
- Event-Enriched Image Analysis Grand Challenge At ACM Multimedia 2025Thien Phuc Tran, Minh-Quang Nguyen, Minh-Triet Tran, Tam V. Nguyen 0002, Trong-Le Do, Duy-Nam Ly, Viet-Tham Huynh, Khanh-Duy Le, Mai-Khiem Tran, Trung-Nghia Le. 14244-14249 [doi]
- ENRIC: EveNt-AwaRe Captioning with Image Retrieval via UnCertainty-Guided Re-ranking and Semantic Ensemble ReasoningNam Quan Nguyen, Minh-Hoang Le 0001, Vinh-Toan Vong, Minh-Triet Tran. 14250-14256 [doi]
- EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic CaptionsDinh-Khoi Vo, Van Loc Nguyen, Minh-Triet Tran, Trung-Nghia Le. 14257-14263 [doi]
- ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian NormalizationThinh Phuc Nguyen, Thanh Hai Nguyen, Gia Huy Dinh, Lam-Huy Nguyen, Minh-Triet Tran, Trung-Nghia Le. 14264-14270 [doi]
- Overview of the First CASTLE Grand Challenge at ACM Multimedia 2025Luca Rossetto, Werner Bailer, Cathal Gurrin, Duc-Tien Dang-Nguyen, Klaus Schoeffmann, Allie Tran. 14271-14272 [doi]
- Interactive Retrieval System for Multi-Stream Collections: multiXview at CASTLE 2025 Interactive Grand ChallengeOmar Shahbaz Khan, Ujjwal Sharma 0001, Gonçalo Marcelino, Aaron Duane, Stevan Rudinac, Marcel Worring, Björn Þór Jónsson 0001. 14273-14279 [doi]
- Extending Lifelog Retrieval to Multi-stream Video Retrieval at the CASTLE Challenge 2025Quang-Linh Tran, Hoang Bao Le, Thang-Long Nguyen-Ho, Graham Healy, Liting Zhou, Allie Tran. 14280-14285 [doi]
- SUMAC '25: 7th Workshop on analySis, Understanding and proMotion of heritAge Contents: Advances in Machine Learning, Signal Processing, Multimodal Techniques and Human-machine InteractionValérie Gouet-Brunet, Edgar Roman-Rangel, Li Weng. 14286-14287 [doi]
- 8th ACM International Workshop on Multimedia Content Analysis in Sports (ACM MMSports'25)Rainer Lienhart, Thomas B. Moeslund, Hideo Saito 0001. 14288-14290 [doi]
- Intelligent Immersification in the Metaverse: AI-Driven Immersive MultimediaAik Beng Ng, Yethoven Tukimin, Jeannie S. Lee, Megani Rajendran, Chek Tien Tan, Indriyati Atmosukarto. 14291-14292 [doi]
- MCHM25: Multimedia Computing for Health and MedicineWei Zhou 0021, Hadi Amirpour, Li Yu 0004, Jungong Han, Richang Hong, Paul L. Rosin. 14293-14295 [doi]
- (RichMediaGAI'25) 3rd International Workshop on Rich Media with Generative AIWei Jiang 0001, Zhenghao Chen, Dong Xu 0001. 14296-14298 [doi]
- AIQAM'25: The 2nd ACM Workshop on AI-powered Question Answering Systems for MultimediaTai Tan Mai, Allie Tran, Quang-Linh Tran, An Nguyen, Hoang Nguyen, Tho Quan, Duc-Tien Dang-Nguyen, Cathal Gurrin. 14299-14301 [doi]
- DHOW '25: 2nd International Workshop on Diffusion of Harmful Content on Online WebAmit Kumar Jaiswal 0001, Thomas Mandl 0001, Gautam Kishore Shahi, Durgesh Nandini, Haiming Liu 0002. 14302-14304 [doi]
- GENEA Workshop 2025: The 6th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied AgentsTaras Kucherenko, Alice Delbosc, Rajmund Nagy, Laura B. Hensel, Youngwoo Yoon, Oya Çeliktutan, Gustav Eje Henter. 14305-14307 [doi]
- MUWS 2025: The 4th International Workshop on Multimodal Human Understanding for the Web and Social MediaSherzod Hakimov, David Semedo, Eric Müller-Budack, Marc A. Kastner 0001, Takahiro Komamizu. 14308-14310 [doi]
- RoboSoft'25: The 1st International Workshop on Vision-Language in Soft RobotZiyu Wei, Luting Wang 0001, Chen Gao 0005, Hongliang Huang, Jiaqi Liu, Li Wen, Si Liu 0001. 14311-14313 [doi]
- MRAC 2025: 3rd International Workshop on Multimodal, Generative and Responsible Affective ComputingZheng Lian 0004, Shreya Ghosh 0001, Erik Cambria, Zhixi Cai, Guoying Zhao 0001, Abhinav Dhall, Björn W. Schuller, Roland Goecke, Jianhua Tao 0001, Tom Gedeon. 14314-14316 [doi]
- (DFF '25) 1st Deepfake Forensics Workshop: Detection, Attribution, Recognition, and Adversarial Challenges in the Era of AI-Generated MediaSebastiano Battiato, Mirko Casu, Francesco Guarnera, Luca Guarnera, Giovanni Puglisi, Orazio Pontorno, Claudio Vittorio Ragaglia, Zahid Akhtar. 14317-14319 [doi]
- 3A)Zheng Wang 0046, Qianqian Chen, Yiyang Luo, Zhiqiu Ye, Shi Wei, Hanwang Zhang, Tat-Seng Chua. 14320-14322 [doi]
- CogMAEC'25: The 1st Workshop on Cognition-oriented Multimodal Affective and Empathetic ComputingHao Fei 0001, Bobo Li 0001, Meng Luo 0010, Qian Liu 0012, Lizi Liao, Fei Li 0021, Min Zhang 0005, Björn W. Schuller, Mong-Li Lee, Erik Cambria. 14323-14325 [doi]
- APP3DV'25: ACM Multimedia - International Workshop on Application-driven Point Cloud Processing and 3D VisionWei Gao 0003, Sam Kwong, Zhu Li 0001, Shan Liu 0001, Ge Li 0002. 14326-14328 [doi]
- McGE '25: The 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and PracticeCheng Jin 0001, Mingli Song, Rui Wang 0032, Xingjiao Wu. 14329-14330 [doi]
- MSMA'2025: The 1st International Workshop on Multi-Sensorial Media and ApplicationsTiesong Zhao, Qian Liu 0001, Zhisheng Yan. 14331-14332 [doi]
- IXR '25: 3rd International Workshop on Interactive eXtended RealityIrene Viola 0001, Silvia Rossi 0001, Marta Orduna, Maria Torres Vega. 14333-14334 [doi]
- MMFood'25: 1st International Workshop on Multi-modal Food ComputingLipika Dey, Marianna Obrist, Stavroula G. Mougiakakou. 14335-14337 [doi]
- Multimodal Learning for Spatio-Temporal Data MiningSiru Zhong, Xixuan Hao, Hao Miao 0001, Yan Zhao 0008, Qingsong Wen, Roger Zimmermann, Yuxuan Liang 0002. 14338-14339 [doi]
- Perceptual Visual Quality Assessment in Multimedia CommunicationWei Zhou 0021, Hadi Amirpour. 14340-14341 [doi]
- Reasoning and Planning for Multimodal Large Language Models: A Multilingual and Cross-Domain ExplorationSarmistha Das, Akash Ghosh, Sriparna Saha 0001, Koustava Goswami, K. J. Joseph. 14342-14343 [doi]
- Combating Online Misinformation Videos: Characterization, Detection, and PreventionQiang Sheng 0001, Peng Qi 0005, Tianyun Yang, Yuyan Bu, Wynne Hsu, Mong-Li Lee, Juan Cao 0001. 14344-14345 [doi]
- Video Question Answering and BeyondYicong Li 0004, Junbin Xiao, Angela Yao, Tat-Seng Chua. 14346-14347 [doi]
- AI-based Multimedia Data Compression: Perception Utility Optimization and StandardizationWei Gao 0003, Ge Li 0002. 14348-14349 [doi]
- An Innovative Industry Program on Multimedia in A New AI EraJianquan Liu, Balu Adsumilli, Yukiko Yanagawa, Haiwei Dong 0001. 14351-14352 [doi]
- How Generative AI Understands the Balance of Energy, Efficiency, and Human ExperienceTomoya Sawada. 14353 [doi]
- Video Content Restoration in the Wild: Challenges and OpportunitiesGuan-Ming Su. 14354 [doi]
- MedAI Hub: A Multimodal Medical Data Platform with Evolutionary Image Enhancement and Graph-Driven Literature RetrievalGuoming Wang. 14355 [doi]
- Will AI Make Agencies Obsolete? Rethinking the Future of AdvertisingAleksandr Farseev. 14356 [doi]
- SOMIN: An Explainable AI and LLM Platform for Real-Time, Data-Driven Digital Marketing StrategyAleksandr Farseev. 14357 [doi]
- A Streamlined System for Multimodal Industrial Anomaly Detection via 2D and 3D Feature FusionWenbing Zhu, Mingmin Chi, Bo Peng. 14358 [doi]
- IDPFlow: A No-Code Agentic Framework for Multimodal Intelligent Document ProcessingGoutham Vignesh, Harikrishnan P. M., Siddartha Reddy, Saisubramaniam Gopalakrishnan, Vishal Vaddina. 14359-14360 [doi]
- XReco Platform and RAI News Media DemonstratorRoberto Iacoviello, Alberto Ciprian, Alberto Messina, Maurizio Montagnuolo, Davide Zappia. 14361 [doi]
- Real-time GenAI Solutions for Video Streaming in Low-bandwidth SettingsClaudio Baecchi, Matteo Bruni, Fabio Clabot, Marco Bertini 0001. 14362-14363 [doi]
- Solving Critical Real-World Business Challenges - NEC's Industrial Research Model in the AI EraYasunori Mochizuki. 14364 [doi]
- Media integrity and literacy in the age of GenAI & DeepfakesChristoph Bregler. 14365 [doi]
- Advancing Lung Cancer Diagnosis with eyonis® LCSBenoit Huet. 14366-14367 [doi]
- Research and Standardization Trends in Compression and Transmission Technologies for 3D Point CloudKeisuke Nonaka. 14368 [doi]
- To Advance People's Well-Being: Human health sensing, analysis, and applicationsTerumi Umematsu. 14369 [doi]
- Spark LLM and the Scientific Research it empowers: Practice and ThoughtsXin Li. 14370 [doi]
- Multimodal Content Creation, Consumption and DistributionTing Yao 0003. 14371 [doi]
- Toward Fast and Exact Machine Learning Platform for Big DataYasuhiro Fujiwara. 14372 [doi]
- Sovereign & Shared: Frugally Scalable Multilingual-Multimodal AI for BharatManeesh Kumar Singh. 14373 [doi]
- Google Industry Seminar: Video Processing in the New Age of AIBalu Adsumilli, Jianle Chen, In Suk Chong, Yilin Wang 0001. 14374-14375 [doi]
- HEAR: A Holistic Extraction and Agentic Reasoning Framework for Document UnderstandingLongfeng Chen, Zheng Xiao, Juyuan Wang, Zeyu Huang, Yawen Zeng, Jin Xu. 14376-14382 [doi]