Abstract is missing.
- Weakly Supervised Intracranial Aneurysm Detection and Segmentation in MR Angiography via Multi-Task UNet with Vesselness PriorErin Rainville, Amirhossein Rasoulian, Hassan Rivaz, Yiming Xiao. 1-10 [doi]
- From Emotions to Violence: Multimodal Fine-Grained Behavior Analysis at the 9th ABAWDimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Gregory G. Slabaugh, Damith Chamalke Senadeera, Jianian Zheng, Kaushal K. K. Yadav, Chunchang Shao, Guanyu Hu 0003. 1-12 [doi]
- On the Use of Hierarchical Vision Foundation Models for Low-Cost Human Mesh Recovery and Pose EstimationShuhei Tarashima, Yushan Wang, Norio Tagawa. 13-24 [doi]
- Valeo Near-Field: A Novel Dataset for Pedestrian Intent DetectionAntonyo Musabini, Rachid Benmokhtar, Xavier Perrotton, Jagdish Bhanushali, Victor Galizzi, Bertrand Luvison. 25-32 [doi]
- Sweating the Stress: GSR Insights into Cognitive Load and DistractionSantiago Clemente, Layla Varghese, Valentina Nino, Maria Valero. 33-41 [doi]
- A Multimodal Dataset of Viewer Responses to Japanese Manzai ComedyKazuki Kawamura, Kengo Nakai, Jun Rekimoto. 42-50 [doi]
- DEAP DIVE: Dataset Investigation with Vision Transformers for EEG EvaluationAnnemarie Hoffsornmer, Helen Schneider, Svetlana Pavlitska, Johann Marius Zöllner. 51-60 [doi]
- Dynamic Temporal Gating Networks for Cross-Modal Valence-Arousal EstimationYubeen Lee, Sangeun Lee, Chaewon Park, Junyeop Cha, Eunil Park. 61-70 [doi]
- Zero-Shot Multimodal Compound Expression Recognition Approach Using Off-the-Shelf Large Visual-Language ModelsElena Ryumina, Maxim Markitantov, Alexandr Axyonov, Dmitry Ryumin, Mikhail Dolgushin, Alexey Karpov 0001. 71-79 [doi]
- Face Video Steganography for Privacy-Protection Automatic Depression AssessmentXinyi Ni, Zijian Wu, Lu Liu, Siyang Song. 80-86 [doi]
- VelocityNet: Real-Time Crowd Anomaly Detection via Person-Specific Velocity AnalysisFatimah Alghamdi, Omar Alharbi, Abdullah AlDwyish, Raied Aljadaany, Muhammad Kamran J. Khan, Huda Alamri. 87-94 [doi]
- Multi-Task Learning for Joint Action and Gesture RecognitionKonstantinos Spathis, Nikolaos Kardaris, Petros Maragos. 95-104 [doi]
- Experimental Evaluation of the Impact of Robot Path Shape and Speed on Human Affective States in a Hallway-Passing ScenarioBenjamin Greenberg, Michael Grossi, Sai Likhith Karri, Jingang Yi, Jacob Feldman, Karin Stromswold. 105-112 [doi]
- Aligning Multimodal Data for Fine-Grained Video Understanding via Cross-Attentive Recurrent FusionNam-Ho Kim, Jun-Hwa Kim. 113-119 [doi]
- Rotation Estimation of Multiple In-Vehicle Cameras from Dual FoVJun Sato, Fumihiko Sakaue, Shunya Kumano, Yusuke Ueda, Naoki Kawasaki. 120-129 [doi]
- Sem-MASt3R: Semantically Guided Feature Matching with MASt3RDario Tenore, Daniel Barath, Marc Pollefeys, Qunjie Zhou. 130-139 [doi]
- $f$ COP: Monocular Focal Length Estimation from Category-Level Object PriorsXinyue Zhang, Jiaqi Yang, Xiangting Meng, Abdelrahman Mohamed, Laurent Kneip. 140-148 [doi]
- Optic Illusion Patterns for Accurate Camera CalibrationGaku Nakano, Takenobu Kiyama. 149-158 [doi]
- Model Ensemble to Fuse Geometric and Learning Solutions for Camera Rotation EstimationBhuvan Aggarwal, Amit More, Mudit Soni, S. Divakar Bhat. 159-168 [doi]
- Direct Camera Calibration from Vanishing Points via Polynomial SolversNorio Kosaka. 169-177 [doi]
- Generalizable Visual Localization for Gaussian Splatting Scene RepresentationsFadi Khatib, Dror Moran, Guy Trostianetsky, Yoni Kasten, Meirav Galun, Ronen Basri. 178-189 [doi]
- FUSELOC: Fusing Global and Local Descriptors for Fast and Robust 2D-3D Matching in Visual LocalizationSon Tung Nguyen, Alejandro Fontán, Michael Milford, Tobias Fischer 0001. 190-199 [doi]
- Alignment Scores: Robust Metrics for Multiview Pose Accuracy EvaluationSeong-Hun Lee, Javier Civera 0001. 200-209 [doi]
- Benchmarking Feature Upsampling Methods for Vision Foundation Models Using Interactive SegmentationVolodymyr Havrylov, Haiwen Huang, Dan Zhang, Andreas Geiger 0001. 210-219 [doi]
- Motion-Refined DINOSAUR for Unsupervised Multi-Object DiscoveryXinrui Gong, Oliver Hahn 0001, Christoph Reich, Krishnakant Singh, Simone Schaub-Meyer, Daniel Cremers, Stefan Roth 0001. 220-230 [doi]
- ImageNet-BG: A Toolkit and Dataset for Evaluating Vision Model Robustness Against Background VariationsLukasz Piekarek, Kamil Szyc. 231-240 [doi]
- From Global to Local: Social Bias Transfer in CLIPRyan Ramos, Yusuke Hirota, Yuta Nakashima, Noa Garcia. 241-250 [doi]
- Robust Experts: the Effect of Adversarial Training on CNNs with Sparse Mixture-of-Experts LayersSvetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, J. Marius Zöllner. 251-260 [doi]
- COOkeD: Ensemble-Based OOD Detection in the Era of Zero-Shot CLIPGaladrielle Humblot-Renaux, Gianni Franchi, Sergio Escalera, Thomas B. Moeslund. 261-271 [doi]
- Are X-Ray Landmark Detection Models Fair? A Preliminary Assessment and Mitigation StrategyRoberto Di Via, Massimiliano Ciranni, Davide Marinelli, Allison Clement, Nikil Patel, Julian Wyatt, Francesca Odone, Matteo Santacesaria, Irina Voiculescu, Vito Paolo Pastore. 272-278 [doi]
- Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation EcosystemNeslihan Kose, Anthony Rhodes, Umur Aybars Ciftci, Ilke Demir. 279-289 [doi]
- Explain with Confidence: Fusing Saliency Maps for Faithful and Interpretable Weakly-Supervised ModelAyush Somani, Arif Ahmed Sekh, Dilip K. Prasad. 290-299 [doi]
- Data Bias Mitigation and Evaluation Framework for Diffusion-Based Generative Face ModelsHao Yu 0014, Margrit Betke, Sarah Adel Bargal. 300-310 [doi]
- Extracting Uncertainty Estimates from Mixtures of Experts for Semantic SegmentationSvetlana Pavlitska, Beyza Keskin, Alwin Faßbender, Christian Hubschneider, J. Marius Zöllner. 311-320 [doi]
- GenAI Confessions: Black-Box Membership Inference for Generative Image ModelsMatyas Bohacek, Hany Farid. 321-330 [doi]
- Enhancing Vision-Language Models for Zero-Shot Video Action Recognition via Visual-Textual Refinement and Improved InterpretabilityAdeel Yousaf, Mubarak Shah. 331-340 [doi]
- DiViD: Disentangled Video Diffusion for Static-Dynamic FactorizationMarzieh Gheisari, Auguste Genovesio. 341-350 [doi]
- FusionGen: Feature Fusion-Based Few-Shot EEG Data GenerationYuheng Chen, Dingkun Liu, Xinyao Yang, Xinping Xu, Baicheng Chen, Dongrui Wu. 351-360 [doi]
- Textual Semantics Matters: Unsupervised Representation Disentanglement in Realistic Scenarios with Language Inductive BiasJunhao Geng, Lexiang Lv, Jianxin Lin. 361-370 [doi]
- Why Compress what you can Generate? When GPT-4o Generation Ushers in Image Compression FieldsYixin Gao, Xiaohan Pan, Xin Li 0082, Zhibo Chen 0001. 371-381 [doi]
- A Guided Fine-Tuning Framework for Diffusion Models With Disentangled Semantic Priors for Multi-Factor Image EditingBaorui Peng, Zhongming Chen. 382-387 [doi]
- Controllable Generation with Disentangled Representative Learning of Multiple Perspectives in Autonomous DrivingHaoran Jin. 388-395 [doi]
- WP-CLIP: Leveraging CLIP to Predict Wölfflin's Principles in Visual ArtAbhijay Ghildyal, Li-Yun Wang, Feng Liu. 396-405 [doi]
- VQArt-Bench: A Semantically Rich VQA Benchmark for Art and Cultural HeritageAndrea Alfarano, Lorenzo Venturoli, Dario Negueruela del Castillo. 406-416 [doi]
- Leveraging Diffusion Models for Stylization Using Multiple Style ImagesDan Ruta, Abdelaziz Djelouah, Raphael Ortiz, Christopher Schroers. 417-426 [doi]
- Composite ReflectionsAbhishek Dangeti, Pavan Gajula, Vikram Jamwal. 427-435 [doi]
- Controllable Single-Shot Animation Blending with Temporal ConditioningEleni Tselepi, Spyridon Thermos, Gerasimos Potamianos. 436-445 [doi]
- Enhancing Artwork Style Clustering via Neural Representation Re-AlignmentAbhishek Dangeti, Pavan Gajula, Vikram Jamwal, Vivek Srivastava. 446-455 [doi]
- LoRA-Loop: Closing the Synthetic Replay Cycle for Continual VLM LearningKaihong Wang, Donghyun Kim 0006, Margrit Betke. 456-465 [doi]
- Synthetic Hands Meet Legacy Data: A Synthetic Dataset for Structured, Controllable, and Multimodal EvaluationMenghe Zhang, Haley M. So, Mohammad Asadi, Dongfang Zhao 0017, Yangwen Liang, Shuangquan Wang, Gordon Wetzstein, Kee-Bong Song, Donghoon Kim. 466-477 [doi]
- On the Generalization of Optical Flow: Quantifying Robustness to Dataset ShiftsKatrin Bauer, Andrés Bruhn, Jenny Schmalfuss. 478-488 [doi]
- Enhancing Domain Diversity in Synthetic Data Face Recognition with Dataset FusionAnjith George, Sébastien Marcel. 489-495 [doi]
- Express4D: Expressive, Friendly, and Extensible 4D Facial Motion Generation BenchmarkYaron Aloni, Rotem Shalev-Arkushin, Yonatan Shafir, Guy Tevet, Ohad Fried, Amit Haim Bermano. 496-505 [doi]
- Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset GenerationFeiran Li, Qianqian Xu 0001, Shilong Bao, Boyu Han, Zhiyong Yang 0001, Qingming Huang. 506-512 [doi]
- Mapillary Vistas Validation for Fine-Grained Traffic Signs: A Benchmark Revealing Vision-Language Model LimitationsSparsh Garg, Abhishek Aich. 513-520 [doi]
- DeepID Challenge of Detecting Synthetic Manipulations in ID DocumentsPavel Korshunov, Vidit Vidit, Amir Mohammadi, Christophe Ecabert, Nevena Shamoska, Sébastien Marcel, Zeqin Yu, Ye Tian, Jiangqun Ni, Lazar Lazarevic, Renat Khizbullin, Anastasiia Evteeva, Alexey Tochin, Aleksei Grishin, Anjith George, Daniel DeAlcala, Tamás Endrei, Javier Muñoz-Haro, Ruben Tolosana, Rubén Vera-Rodríguez, Aythami Morales, Julian Fierrez, György Cserey, Hardik Sharma, Sachin Chaudhary, Akshay Dudhane, Praful Hambarde, Amit Shukla 0002, Prateek Shaily, Jayant Kumar, Ajinkya Hase, Satish Maurya, Mridul Sharma, Pallav Dwivedi. 521-530 [doi]
- BeatFormer: Efficient Motion-Robust Remote Heart Rate Estimation Through Unsupervised Spectral Zoomed Attention FiltersJoaquim Comas, Federico Sukno. 531-541 [doi]
- A High-Throughput Platform to Bench Test Smartphone-Based Heart Rate Measurements Derived from VideoMing-Zher Poh, Jonathan Wang, Jonathan Hsu, Lawrence Cai, Eric Teasley, James A. Taylor 0001, Jameson K. Rogers, Anupam Pathak, Shwetak N. Patel. 542-548 [doi]
- Remote Heart Rate Measurement Based on Near Infrared Prior InformationHang Shao 0001, Chuanfei Hu. 549-556 [doi]
- VitalVideos-Worldwide: A Large and Diverse rPPG Dataset with Rich Ground TruthsToye Pieter-Jan. 557-562 [doi]
- AdeptHEQ-FL: Adaptive Homomorphic Encryption for Federated Learning of Hybrid Classical-Quantum Models with Dynamic Layer SparingMd Abrar Jahin, Taufikur Rahman Fuad, Muhammad Firoz Mridha, Nafiz Fahad, Md. Jakir Hossen. 563-572 [doi]
- Evaluating the Trustworthiness of Foundation Models for Skin Lesion SegmentationYusung Chu, Byungho Oh, Sejung Yang. 573-583 [doi]
- Understanding Dataset Bias in Medical Imaging: A Case Study on Chest X-raysEthan Dack, Chengliang Dai. 584-594 [doi]
- Dino2-DR: A Trustworthy and Explainable Vision Transformer for Cross-Domain Diabetic Retinopathy GradingLucia Cascone, Luigi Di Biasi, Giuseppe Genito, Michele Nappi. 595-604 [doi]
- TrustMatch: Mitigating Pseudo-Label Bias in Semi-Supervised Learning with Trust-Aware RefinementHongyang He, Yundi Hong. 605-614 [doi]
- WaveDamp: Enhancing Natural Robustness in Endoscopy Through Wavelet-Based Frequency DampingHaiko Middeljans, Carolus H. J. Kusters, Tim J. M. Jaspers, Martijn R. Jong, Rixta A. H. van Eijck van Heslinga, Floor Slooter, Albert J. de Groof, Jacques J. Bergman, Peter H. N. de With, Fons van der Sommen. 615-625 [doi]
- SAM-SPJunc: Self-Prompting for Junction Detection in Retinal Images via Radius-Based RepresentationsMinasadat Attari, Kannappan Palaniappan, Filiz Bunyak. 626-634 [doi]
- Measuring and Addressing Information Leakage in Concept Bottleneck ModelsRaffael Schon, Baptiste Abeloos, Stéphane Herbin. 635-643 [doi]
- Are Medical Image Generative Models Biologically Trustworthy?Suhyun Ahn, Wonjung Park, Jinah Park. 644-654 [doi]
- Geometric Inductive Priors in Diffusion-Based Optical Flow EstimationAlberto Pepe, Joan Lasenby, Paulo dos Santos Mendonca. 655-665 [doi]
- HIVE: A Hyperbolic Interactive Visualization Explorer for Representation LearningThijmen Nijdam, Derck W. E. Prinzhorn, Jurgen de Heus, Thomas Brouwer. 666-671 [doi]
- Flatland and Beyond: Mutual Information Across GeometriesYoussef Wally, Johan Mylius-Kroken, Michael Kampffmeyer, Rezvan Ehsani, Vladan Milosevic, Elisabeth Wetzer. 672-681 [doi]
- HierVision: Standardized and Reproducible Hierarchical Sources for Vision DatasetsTejaswi Kasarla, Ruthu Hulikal Rooparaghunath, Stefano D'Arrigo, Gowreesh Mago, Abhishek Jha 0001, Melika Ayoughi, Swasti Shreya Mishra, Ana Manzano Rodríguez, Teng Long 0002, Mina Ghadimi Atigh, Max van Spengler, Pascal Mettes. 682-695 [doi]
- HAPPI: Hyperbolic Hierarchical Part Prototypes for Image RecognitionHooman Vaseli, Victoria Wu, Nima Kondori, Nguyen Nhat Minh To, Andrea Fung, Ang Nan Gu, Purang Abolmaesumi. 696-705 [doi]
- Sparse Hyperbolic Convolutional Networks with Enhanced Object Localization via GradCAM AnalysisVijayavallabh Jayamanikandan, Jithamanyu Settur, Lokesh K. Rajulapati, Raghunathan Rengaswamy. 706-714 [doi]
- Do VLMs Have Bad Eyes? Diagnosing Compositional Failures via Mechanistic InterpretabilityAshwath Vaithinathan Aravindan, Abha Jha, Mihir Kulkarni. 715-723 [doi]
- Explaining Object Detection Through Difference MapShujun Xia, Chenyang Zhao, Antoni Chan. 724-733 [doi]
- Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast AttentionDrandreb Earl O. Juanico, Rowel O. Atienza, Jeffrey Kenneth Go. 734-743 [doi]
- GFR-CAM: Gram-Schmidt Feature Reduction for Hierarchical Class Activation MapsKaveh Safavigerdini, Bahram Yaghooti, Amir Erfan Zareei Shams Abadi, Kannappan Palaniappan. 744-753 [doi]
- Rethinking Explainer Trust: A Position on the Inconsistencies of Visual Explanations in Weakly Supervised SegmentationAyush Somani, Dilip K. Prasad. 754-763 [doi]
- 2COOOL: 2nd Workshop on the Challenge of Out-of-Label Hazards in Autonomous DrivingAli K. AlShami, Ryan Rabinowitz, Maged Shoman, Jianwu Fang, Lukás Picek, Shao-Yuan Lo, Steve Cruz, Khang Nhut Lam, Nachiket Kamod, Lei-Lei Li, Jugal Kalita, Terrance E. Boult. 764-771 [doi]
- VRU-Accident: A Vision-Language Benchmark for Video Question Answering and Dense Captioning for Accident Scene UnderstandingYounggun Kim, Ahmed S. Abdelrahman, Mohamed A. Abdel-Aty. 772-782 [doi]
- Uncertainty-Aware Likelihood Ratio Estimation for Pixel-Wise Out-of-Distribution DetectionMarc Hölle, Walter Kellermann, Vasileios Belagiannis. 783-793 [doi]
- Adapt, But Don't Forget: Fine-Tuning and Contrastive Routing for Lane Detection under Distribution ShiftMohammed Abdul Hafeez Khan, Parth Ganeriwala, Sarah M. Lehman, Siddhartha Bhattacharyya, Amy Alvarez, Natasha A. Neogi. 794-804 [doi]
- Interpretable Decision-Making for End-to-End Autonomous DrivingMona Mirzaie, Bodo Rosenhahn. 805-815 [doi]
- Fourier Domain Adaptation for Traffic Light Detection in Adverse WeatherIshaan Gakhar, Aryaman Gupta, Aryesh Guha, Amit Agarwal 0005, Ujjwal Verma. 816-825 [doi]
- Drama-X: A Fine-Grained Intent Prediction and Risk Reasoning Benchmark for DrivingMihir Godbole, Xiangbo Gao, Zhengzhong Tu. 826-831 [doi]
- SADWA: Fine-Grained Weather Awareness with Vision-Language Models for Seamless Autonomous Driving in Real TimeJinwoo Kim, Hayeon O, Youngmin Oh, Kyounghwan An, Donghwan Lee. 832-841 [doi]
- FlareGS: 4D Flare Removal Using Gaussian Splatting for Urban ScenesMayank Chandak, Kuppa Sai Sri Teja, Rahul, Gopi Raju Matta, Vinayak Gupta, Kaushik Mitra. 842-851 [doi]
- Towards Vision Zero: The TUM Traffic Accid3nD DatasetWalter Zimmer, Ross Greer, Xingcheng Zhou, Rui Song 0007, Hu Cao, Daniel Lehmberg, Marc Pavel, Ahmed Alaaeldin Ghita, Akshay Gopalkrishnan, Holger Caesar, Mohan M. Trivedi, Alois C. Knoll. 852-862 [doi]
- Simplifying Traffic Anomaly Detection with Video Foundation ModelsSvetlana Orlova, Tommie Kerssies, Brunó Bence Englert, Gijs Dubbelman. 863-873 [doi]
- LaViPlan: Language-Guided Visual Path Planning with RLVRHayeon Oh. 874-883 [doi]
- Efficient Self-Supervised Adaptation for Medical Image AnalysisMoein Sorkhei, Emir Konuk, Jingyu Guo, Chanjuan Meng, Christos Matsoukas, Kevin Smith 0001. 884-891 [doi]
- Comparison of Digital Histology AI Models with Low-Dimensional Genomic and Clinical Models in Survival Modeling for Prostate CancerAidan Mcloughlin, Ho YinHo, Xin Zhao, Alexander Karl Hakansson, Alireza Moradi, Qi Joslove Xu, Yang Liu. 892-902 [doi]
- GADA: Graph Attention-based Detection Aggregation for Ultrasound Video ClassificationLi Chen, Naveen Balaraju, Jochen Kruecker, Balasundar Raju, Alvin Chen. 903-911 [doi]
- Latent Gene Diffusion for Spatial Transcriptomics CompletionPaula Cárdenas, Leonardo Manrique, Daniela Vega, Daniela Ruiz, Pablo Arbeláez. 912-921 [doi]
- From Cosmos to Clinic: Interpretable Spatial Statistics for Histopathology PrognosisAttila Barna, Oz Kilim, István Csabai. 922-929 [doi]
- MPromer: A Unified Diffusion-Based Framework for Scalable and Generalizable Multi-Modal Medical Image SegmentationWafa Al Ghallabi, Akshay Dudhane, Syed Waqas Zamir, Salman H. Khan 0001, Fahad Shahbaz Khan. 930-938 [doi]
- Automated Assessment of Aesthetic Outcomes in Facial Plastic SurgeryPegah Varghaei, Kiran Abraham-Aggarwal, Manoj T. Abraham, Arun Ross. 939-948 [doi]
- TextSAM-EUS: Text Prompt Learning for SAM to Accurately Segment Pancreatic Tumor in Endoscopic UltrasoundPascal Spiegler, Taha Koleilat, Arash Harirpoush, Corey S. Miller, Hassan Rivaz, Marta Kersten-Oertel, Yiming Xiao. 959-968 [doi]
- QPolypNet: A Quantum-Inspired Deep Learning Model for Polyp SegmentationMd Majedul Islam, Rashik Shahriar Akash, Sayed Mehedi Azim, Selena He. 980-989 [doi]
- EchoNet-Quality: Denoising Echocardiograms via Deep Generative Modeling of Ultrasound NoiseDavid Choi, Milos Vukadinovic, Bryan He, Christina Binder, Yuki Sahashi, David Ouyang. 990-997 [doi]
- Breast Cancer Detection with Topological Deep LearningBrighton Nuwagira, Adrian Rodriguez, Qiwei Li, Baris Coskunuzer. 998-1008 [doi]
- Endonama: Graph Neural Networks Based In-Silico Transcriptomics from Histology Whole Slide Images for Fertility DiagnosticsGeorge Wright, Paul Brighton, Joanne Muter, Jan Brosens, Fayyaz Minhas. 1009-1018 [doi]
- Artifact Correction in Panoramic Radiographs Using Deep De-ShadowingOmri Dan, Samuel Lilek, Ariel Hirschhorn, Lazar Kats, Nahum Kiryati, Arnaldo Mayer. 1019-1027 [doi]
- Unsupervised Abnormality Segmentation in Chest CT with Anatomy-Guided Latent Diffusion Model and Adaptive ThresholdingXueqi Guo, Yoshihisa Shinagawa, Sepehr Farhand, Halid Yerebakan, Kritika Iyer, Matthias Wolf 0001, Gerardo Hermosillo Valadez. 1028-1035 [doi]
- Unsupervised Nuclei Segmentation by Improving Pseudo Labels from Segment Anything ModelRyota Nakai, Kazuhiro Hotta. 1036-1044 [doi]
- MedSAM-Guided Curriculum Learning for White Matter Tract Segmentation in Block Face Imaging of Fetal BrainAthira Kalladayil Shibu, Sriprabha Ramanarayanan, Vinoth Kanna, Jaikishan Jayakumar, Keerthi Ram, Mohanasankar Sivaprakasam. 1045-1052 [doi]
- MK-UNet: Multi-Kernel Lightweight CNN for Medical Image SegmentationMd Mostafijur Rahman, Radu Marculescu. 1053-1062 [doi]
- Render2Seg: Landmark-Guided Patch-Wise Segmentation for Robust 3D Dental Mesh AnalysisSena Lee, Guenhye Kim, Yongkyu Jin, Sejung Yang. 1063-1072 [doi]
- DCEtriformer: A Hybrid Attention Transformer for DCE-MRI Synthesis in Prostate ImagingSadhana S, Sriprabha Ramanarayanan, Kishore Kumar M, Keerthi Ram, Harsh Agarwal, Ramesh Venkatesan, Mohanasankar Sivaprakasam. 1073-1082 [doi]
- Increasing the Classification Rates of the Trained Models Using Invariant Dataset AugmentationsPiotr Milczarski. 1083-1092 [doi]
- BAMPolyp: Bi-Axial Mamba Bottleneck for Gastrointestinal Polyp SegmentationMd. Farhadul Islam, Tashik Ahmed, Partho Chanda, Joyanta Jyoti Mondal, Meem Arafat Manab, Sarah Zabeen, Jannatun Noor. 1093-1103 [doi]
- Unsupervised Domain Adaptation via Content Alignment for Hippocampus SegmentationHoda Kalabizadeh, Ludovica Griffanti, Pak-Hei Yeung, Ana I. L. Namburete, Nicola K. Dinsdale, Konstantinos Kamnitsas. 1104-1114 [doi]
- Taming Modern Point Tracking for Speckle Tracking Echocardiography via Impartial MotionMd Abulkalam Azad, John Nyberg, Håvard Dalen, Bjørnar Leangen Grenne, Lasse Løvstakken, Andreas Østvik. 1115-1124 [doi]
- MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image SegmentationGurucharan Marthi Krishna Kumar, Aman Chadha, Janine D. Mendola, Amir Shmuel. 1125-1135 [doi]
- CytoDiff: AI-Driven Cytomorphology Image Synthesis for Medical DiagnosticsJan Carreras Boada, Rao Muhammad Umer, Carsten Marr. 1136-1144 [doi]
- Impact of Black-Box Adversarial Attacks on Deep Neural Networks for Skin ImagingBartlomiej Moniak, Joanna Jaworek-Korjakowska. 1145-1152 [doi]
- A Dynamic Agent Framework for Large Language Model Reasoning for Medical and Visual Question AnsweringZiyan Xiao, Ruiyang Zhang, Yushi Feng, Lingting Zhu, Liang Peng, Lequan Yu. 1154-1163 [doi]
- Generative Counterfactual Augmentation for Bias MitigationJason Uwaeze, Pranav Kulkarni, Vladimir Braverman, Michael A. Jacobs, Vishwa S. Parekh. 1164-1171 [doi]
- CECT-Mamba: A Hierarchical Contrast-Enhanced-Aware Model for Pancreatic Tumor Subtyping from Multi-Phase CECTZhifang Gong, Shuo Gao, Ben Zhao, Yingjing Xu, Yijun Yang, Shenghong Ju, Guangquan Zhou. 1172-1182 [doi]
- Dual-LVT: A Dual Attention Language-Vision Transformer for Tumor SegmentationShiMan Zhang, Songzhu Zheng, Ming Ma. 1183-1192 [doi]
- A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial TranscriptomicsRushin H. Gindra, Giovanni Palla, Mathias Nguyen, Sophia J. Wagner, Manuel Tran, Fabian J. Theis, Dieter Saur, Lorin Crawford, Tingying Peng. 1193-1203 [doi]
- CARDIUM: Congenital Anomaly Recognition with Diagnostic Images and Unified Medical RecordsDaniela Vega, Hannah V. Ceballos, Javier S. Vera, Santiago Rodríguez, Alejandra Pérez, Angela Castillo, María Escobar, Dario Londoño, Luis A. Sarmiento, Camila I. Castro, Nadiezhda Rodriguez, Juan C. Briceño, Pablo Arbeláez. 1204-1213 [doi]
- Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and AlignmentNazanin Moradinasab, Saurav Sengupta, Jiebei Liu, Sana Syed, Donald E. Brown. 1214-1223 [doi]
- MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk PredictionShunjie-Fabian Zheng, Hyeonjun Lee, Thijs Kooi, Ali Diba. 1224-1233 [doi]
- MedBLINK: Probing Basic Perception in Multimodal Language Models for MedicineMahtab Bigverdi, Wisdom Oluchi Ikezogwo, Kevin Zhang, Hyewon Jeong, Mingyu Lu, Sungjae Cho, Linda G. Shapiro, Ranjay Krishna. 1234-1244 [doi]
- Knowledge-Driven Query Network with Adaptive Cross-View Attention for Structured Radiology Report GenerationXuege Hou, Yali Li 0001, Shengjin Wang. 1245-1254 [doi]
- 3Seg: Lean Linear Layers for Language-Guided Vision Transformer in Medical Image SegmentationRahul Bhardwaj, Utkarsh Yashwant Tambe, Debanga Raj Neog. 1255-1264 [doi]
- PMC-Vid: A Large-Scale Biomedical Video Captioning DatasetYosuke Yamagishi, Kuniaki Saito, Atsushi Hashimoto 0001, Yoshitaka Ushiku. 1265-1275 [doi]
- Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across ModalitiesAnindya Bijoy Das, Shahnewaz Karim Sakib, Shibbir Ahmed. 1276-1283 [doi]
- Multiplicative Loss for Enhancing Semantic Segmentation in Medical and Cellular ImagesYuto Yokoi, Kazuhiro Hotta. 1284-1292 [doi]
- Enhancing the Reliability of Auto-Prompting SAM for Medical Image Segmentation with Uncertainty Estimation and RectificationYichi Zhang 0007, Shiyao Hu, Le Xue, Sijie Ren, Zixin Hu, Yuan Cheng, Yuan Qi 0001. 1293-1302 [doi]
- SurGen-Net: A Generative Approach for Surgical VQA with Structured Text GenerationYongjun Jeon, Seonmin Park, Jongmin Shin, Kanggil Park, Bogeun Kim, Namkee Oh, Kyu-Hwan Jung. 1303-1310 [doi]
- Not Only Grey Matter: OmniBrain for Robust Multimodal Classification of Alzheimer's DiseaseAhmed Sharshar, Yasser Ashraf, Tameem Bakr, Salma Hassan, Hosam Elgendy, Mohammad Yaqub, Mohsen Guizani. 1311-1320 [doi]
- Building a General SimCLR Self-Supervised Foundation Model Across Neurological Diseases to Advance 3D Brain MRI DiagnosesEmily Kaczmarek, Justin Szeto, Brennan Nichyporuk, Tal Arbel. 1321-1330 [doi]
- Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language ModelsAnita Kriz, Elizabeth Laura Janes, Xing Shen 0001, Tal Arbel. 1331-1340 [doi]
- Diffusion-Based Data Augmentation for Medical Image SegmentationMaham Nazir, Muhammad Aqeel, Francesco Setti. 1341-1350 [doi]
- Autoregressive-Conditioned Diffusion for Semi-Supervised Thyroid Ultrasound Segmentation with Optical Flow-Based Pseudo LabelsXiaorui Liu, Yijun Yang, Yingjing Xu, Lei Zhu. 1351-1361 [doi]
- PCB-SAID: A Low-Cost Camera-Based Dataset for Few-Shot SMD Assembly InspectionRaffaele Mineo, Amelia Sorrenti, Robin Faro, Gabriele Mineo, Francesco Cancelliere, Alberto Faro. 1362-1368 [doi]
- Development and Implementation of a Digital Twin for Power Plants: Enhancing Maintenance and Operational EfficiencyGuneet Bhatia, Nagarjun, Jason Voelker. 1369-1376 [doi]
- Deep Learning-Based Rail Surface Condition EvaluationShilin Hu, Ke Ma, Sagnik Das, Dichang Zhang, Dimitris Samaras. 1377-1386 [doi]
- Two-stage Text-Guided Diffusion Models for Disentangled Industrial Defect GenerationJing Wei, Qingfeng Shi, Fei Shen 0002, Zhengtao Zhang. 1387-1395 [doi]
- SupConWI-RL: Wafer Inspection with Reinforcement Learning Enhanced by Supervised Contrastive LearningAleksandr Dekhovich, Oleg Soloviev. 1396-1405 [doi]
- Towards Automated Assembly Quality Inspection with Synthetic Data and Domain RandomizationXiaomeng Zhu, Jacob Henningsson, Duruo Li, Pär Mårtensson, Lars Hanson, Mårten Björkman, Atsuto Maki. 1406-1414 [doi]
- 2: Weakly Supervised Segmentation Using Before-After Supervision in Waste SortingAndrea Marelli, Alberto Foresti, Leonardo Pesce, Giacomo Boracchi, Mario Grosso. 1415-1424 [doi]
- SynSpill: Improved Industrial Spill Detection with Synthetic DataAaditya Baranwal, Abdul Mueez, Jason Voelker, Guneet Bhatia, Shruti Vyas. 1425-1434 [doi]
- InspectVLM: Unified in Theory, Unreliable in PracticeConor Wallace, Isaac Corley, Jonathan Lwowski. 1435-1443 [doi]
- iSafetyBench: A Video-Language Benchmark for Safety in Industrial EnvironmentRaiyaan Abdullah, Yogesh Singh Rawat, Shruti Vyas. 1444-1453 [doi]
- Robust Anomaly Detection in Industrial Environments via Meta-LearningMuhammad Aqeel, Shakiba Sharifi, Marco Cristani, Francesco Setti. 1454-1462 [doi]
- A Contrastive Learning-Guided Confident Meta-Learning for Zero Shot Anomaly DetectionMuhammad Aqeel, Danijel Skocaj, Marco Cristani, Francesco Setti. 1463-1472 [doi]
- RDDPM: Robust Denoising Diffusion Probabilistic Model for Unsupervised Anomaly SegmentationMehrdad Moradi, Kamran Paynabar. 1473-1482 [doi]
- UniLEAD: A Unified and LightwEight Model for Anomaly DetectionShih Chih Lin, Shang-Hong Lai. 1483-1491 [doi]
- RealTalk: Realistic Emotion-Aware Lifelike Talking-Head SynthesisWenqing Wang, Yun Fu. 1492-1501 [doi]
- Toward Socially Aware Vision-Language Models: Evaluating Cultural Competence Through Multimodal Story GenerationArka Mukherjee, Shreya Ghosh 0006. 1502-1512 [doi]
- SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection Against Adversarial AttacksKutub Uddin, Awais Khan 0007, Muhammad Umar Farooq, Khalid Mahmood Malik. 1513-1522 [doi]
- CustomMark: Customization of Diffusion Models for Proactive AttributionVishal Asnani, John P. Collomosse, Xiaoming Liu 0002, Shruti Agarwal. 1523-1533 [doi]
- MultiNeRF: Multiple Watermark Embedding for Neural Radiance FieldsYash Kulthe, Andrew Gilbert, John P. Collomosse. 1534-1543 [doi]
- FreqPure: A High-Frequency Preservation Diffusion-Based Purification Method for Protective Perturbation RemovalYan Ju, Hongfei Xue, Siwei Lyu. 1544-1553 [doi]
- Adaptive Test-Time Semantic Debiasing for AI-Generated Image DetectionYu Cai, Jiahe Tian, Xiaomeng Fu, Jiao Dai, Jizhong Han, Siwei Lyu. 1554-1563 [doi]
- CatAID: Category-Guided AI-Generated Image Detection via Vision-Language Model AdaptationYu Cai, Shan Jia, Jiahe Tian, Jiao Dai, Jizhong Han, Siwei Lyu. 1564-1574 [doi]
- Is JPEG AI Going to Change Image Forensics?Edoardo Daniele Cannas, Sara Mandelli, Natasa Popovic, Ayman Alkhateeb, Alessandro Gnutti, Paolo Bestagini, Stefano Tubaro. 1575-1586 [doi]
- Explainable AI-Generated Image Forensics: A Low-Resolution Perspective with Novel Artifact TaxonomyKaustubh Sharma. 1587-1596 [doi]
- Phoneme-Level Analysis for Person-of-Interest Speech Deepfake DetectionDavide Salvi, Viola Negroni, Sara Mandelli, Paolo Bestagini, Stefano Tubaro. 1597-1606 [doi]
- PIA: Deepfake Detection Using Phoneme-Temporal and Identity-Dynamic AnalysisSoumyya Kanti Datta, Tanvi Ranga, Chengzhe Sun 0001, Siwei Lyu. 1607-1617 [doi]
- Adversarial Reality for Evading Deepfake Image DetectorsUmur Aybars Ciftci, Nicholas Solar, Emily Greene, Sophie Riley Saremsky, Ilke Demir. 1618-1629 [doi]
- Witness Sensing for Verifying the Human Origin of Digital MediaIsabella Lenz, Yu Rong 0002, Daniel W. Bliss, Julie Liss, Visar Berisha. 1630-1639 [doi]
- The Effect of Semantically Aligned Images for Deepfake DetectionAritra Bandyopadhyay, Tyme Chatupanyachotikul, Jose L. Garcia, Arijit Ghosh, Karolina Hajkova, Carlos Miguel Patiño, Ivona Najdenkoska. 1640-1649 [doi]
- WavePaint: Resource-Efficient Token-Mixer for Self-Supervised InpaintingPranav Jeevan, Dharshan Sampath Kumar, Amit Sethi. 1650-1659 [doi]
- FLD+: Data-Efficient Evaluation Metric for Generative ModelsPranav Jeevan, Neeraj Nixon, Amit Sethi. 1660-1668 [doi]
- V-RoAst: Visual Road Assessment Can VLM be a Road Safety Assessor using the iRAP Standard?Natchapon Jongwiriyanurak, Zichao Zeng, June Moh Goo, Xinglei Wang, Ilya Ilyankou, Kerkritt Srirrongvikrai, Nicola Christie, Meihui Wang, Huanfa Chen, James Haworth. 1669-1678 [doi]
- Simulating Refractive Distortions and Weather-Induced Artifacts for Resource-Constrained Autonomous PerceptionMoseli Mots'oehli, Feimei Chen, Hok Wai Chan, Itumeleng Tlali, Thulani Babeli, Kyungim Baek, Huaijin Chen 0005. 1679-1688 [doi]
- Nayana: A Foundation for Document-Centric Vision-Language Models via Multi-Task, Multimodal, and Multilingual Data SynthesisAdithya S. Kolavi, Samarth P, Vyoman Jain. 1689-1698 [doi]
- LightPrune: Latency-Aware Structured Pruning for Efficient Deep Inference on Embedded DevicesAsma Belhadi, Youcef Djenouri, Ahmed Nabil Belbachir. 1699-1708 [doi]
- Rethinking Backbone Design for Lightweight 3D Object Detection in LiDARAdwait Chandorkar, Hasan Tercan, Tobias Meisen. 1709-1717 [doi]
- Contextual Convolutions for Scalable Forward-Only Learning on Tiny DevicesMehdi Abbassi, Alberto Ancilotto, Elisabetta Farella. 1718-1725 [doi]
- Implementation of Extremely Low Power Visual Perception Algorithm on Programmable Vision Accelerator: Examples of Challenges and SolutionsSanmati Kamath, Ching Hung, Jagadeesh Sankaran, Zoran Nikolic, Eric Viscito, Branislav Kisacanin. 1726-1734 [doi]
- Scalable Optical Convolutional Neural Networks for Edge ApplicationsVenkata Anirudh Puligandla, Vladimir Ceperic, Tihomir Knezevic. 1735-1744 [doi]
- Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor NavigationDaniel Airinei, Elena Burceanu, Marius Leordeanu. 1745-1754 [doi]
- Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-Time Quadrotor ControlSebastian Mocanu, Sebastian-Ion Nae, Mihai-Eugen Barbu, Marius Leordeanu. 1755-1764 [doi]
- Contextual-Personalized Adaptive Cruise Control via Fine-Tuned Large Language ModelsZiye Qin, Xue Yao, Chuheng Wei, Ang Ji, Guoyuan Wu 0001, Zhanbo Sun. 1765-1773 [doi]
- RG-Attn: Radian Glue Attention for Multi-Modal Multi-Agent Cooperative PerceptionLantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun. 1774-1783 [doi]
- D3FNet: A Differential Attention Fusion Network for Fine-Grained Road Structure Extraction in Remote Perception SystemsChang Liu, Yang Xu, Tamás Szirányi. 1784-1793 [doi]
- Understanding what Vision-Language Models See in Traffic: PixelSHAP for Object-Level Attribution in Autonomous DrivingRoni Goldshmidt. 1794-1802 [doi]
- SlimComm: Doppler-Guided Sparse Queries for Bandwidth-Efficient Cooperative 3-D PerceptionMelih Yazgan, Qiyuan Wu, Iramm Hamdard, Shiqi Li, J. Marius Zoellner. 1803-1812 [doi]
- Robust Scenario Mining Assisted by Multimodal SemanticsYifei Chen, Ross Greer. 1813-1822 [doi]
- MAP: End-to-End Autonomous Driving with Map-Assisted PlanningHuilin Yin, Yiming Kan, Daniel Watzenig. 1823-1830 [doi]
- SpaRC-AD: A Baseline for Radar-Camera Fusion in End-to-End Autonomous DrivingPhilipp Wolters, Johannes Gilg, Torben Teepe, Gerhard Rigoll. 1831-1841 [doi]
- Improving Event-Phase Captions in Multi-View Urban Traffic Videos via Prompt-Aware LoRA Tuning of Vision Language ModelsRicardo Ornelas, Alton Chao, Shyam Gupta, Edmund Chao, Ross Greer. 1842-1848 [doi]
- Research Challenges and Progress in the End-to-End V2X Cooperative Autonomous Driving CompetitionRuiyang Hao, Haibao Yu, Jiaru Zhong, Chuanye Wang, Jiahao Wang, Yiming Kan, Wenxian Yang, Siqi Fan 0002, Huilin Yin, Jianing Qiu, Yao Mu 0001, Jiankai Sun, Li Chen 0008, Walter Zimmer, Dandan Zhang, Shanghang Zhang, Mac Schwager, Ping Luo 0002, Zaiqing Nie. 1849-1860 [doi]
- Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion SequenceDanzhen Fu, Jiagao Hu, Daiguo Zhou, Fei Wang, Zepeng Wang, Wenhua Liao. 1861-1870 [doi]
- SEED-Story: Multimodal Long Story Generation with Large Language ModelShuai Yang, Yuying Ge, Yang Li, Yukang Chen, Yixiao Ge, Ying Shan, Yingcong Chen. 1871-1881 [doi]
- UniPaint: Unified Space-Time Video Inpainting via Mixture-of-ExpertsZhen Wan, Chenyang Qi, Zhiheng Liu, Tao Gui, Yue Ma. 1882-1892 [doi]
- Sketch-to-Layout: Sketch-Guided Multimodal Layout GenerationRiccardo Brioschi, Aleksandr Alekseev, Emanuele Nevali, Berkay Döner, Omar El Malki, Blagoj Mitrevski, Leandro Kieliger, Mark Collier, Andrii Maksai, Jesse Berent, Claudiu Cristian Musat, Efi Kokiopoulou. 1893-1905 [doi]
- Null Text-Guided Interactive Image Editing for Diffusion ModelsJing Wang, Hao Luo. 1906-1915 [doi]
- DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion PriorsHanwen Zhu, Ruining Li, Tomas Jakab. 1916-1926 [doi]
- Concat-ID: Towards Universal Identity-Preserving Video SynthesisYong Zhong, Zhuoyi Yang, Jiayan Teng, Xiaotao Gu, Chongxuan Li. 1927-1936 [doi]
- Beyond Flat Text: Dual Self-Inherited Guidance for Visual Text GenerationMinxing Luo, Zixun Xia, Liaojun Chen, Zhenhang Li, Weichao Zeng, Jianye Wang, Wentao Cheng, Yaxing Wang, Yu Zhou, Jian Yang. 1937-1946 [doi]
- DepthDance: Complex-pose Human Image Animation with Appearance-agnostic Depth GuidanceYingjie Xi, Zhengze Xu, Zhao Wang, Xiaosong Yang, Jinsong Lan, Jian-Jun Zhang, Mengting Chen. 1947-1957 [doi]
- Model as a Game: On Numerical and Spatial Consistency for Generative GamesJingye Chen, YuZhong Zhao, Yupan Huang, Lei Cui 0001, Li Dong 0004, Tengchao Lv, Qifeng Chen 0001, Furu Wei. 1958-1967 [doi]
- StyleBooth: Image Style Editing with Multimodal InstructionZhen Han, Chaojie Mao, Zeyinzi Jiang, Yulin Pan, Jingfeng Zhang. 1968-1978 [doi]
- ACE++: Instruction-Based Image Creation and Editing via Context-Aware Content FillingChaojie Mao, Jingfeng Zhang, Yulin Pan, Zeyinzi Jiang, Zhen Han, Yu Liu, Jingren Zhou 0001. 1979-1987 [doi]
- 2D Instance Editing in 3D SpaceYuHuan Xie, Aoxuan Pan, Ming-Xian Lin, Wei Huang, Yi-Hua Huang, Xiaojuan Qi 0001. 1988-1995 [doi]
- Axes-and-Tags: LLM-Driven Design Galleries for Generative ContentAsanshay Gupta, Vishnu Sarukkai, Kayvon Fatahalian. 1996-2005 [doi]
- MFT-VITON: High-Fidelity Virtual Try-On with Minimal Input via a Mask-Free Transformer-Diffusion ModelZhenchen Wan, Yanwu Xu 0003, Dongting Hu, Weilun Cheng, Tianxi Chen, Zhaoqing Wang, Feng Liu 0003, Tongliang Liu, Mingming Gong. 2006-2015 [doi]
- Enhancing Identity Preservation in Portrait Generation via Reward OptimizationYang Liu 0356, Hongyu Zang, Chao Xu, Baigui Sun, Shan Luo 0001. 2016-2025 [doi]
- GenEscape: Hierarchical Multi-Agent Generation of Escape Room PuzzlesMengyi Shan, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz. 2026-2034 [doi]
- Superior and Pragmatic Talking Face Generation with Teacher-Student FrameworkChao Liang, Jianwen Jiang, Tianyun Zhong, Gaojie Lin, Zhengkun Rong, Yongming Zhu, Jiaqi Yang 0008, Xin Chen. 2035-2044 [doi]
- Causal Motion Tokenizer for Streaming Motion GenerationBiao Jiang, Xin Chen 0040, Ailing Zeng, Xinru Sun, Fukun Yin, Xianfang Zeng, Xuanyang Zhang, Gang Yu 0002, Tao Chen 0003. 2045-2055 [doi]
- ID-Consistent, Precise Expression Generation with Blendshape-Guided DiffusionFoivos Paraperas Papantoniou, Stefanos Zafeiriou. 2056-2065 [doi]
- Automated Detection of Antarctic Benthic Organisms in High-Resolution in Situ Imagery to Aid Biodiversity MonitoringCameron Trotter, Huw J. Griffiths, Tasnuva Ming Khan, Rowan J. Whittle. 2066-2076 [doi]
- The Point is the Mask: Scaling Coral Reef Segmentation with Weak SupervisionMatteo Contini, Victor Illien, Sylvain Poulain, Serge Bernard, Julien Barde, Sylvain Bonhommeau, Alexis Joly. 2077-2086 [doi]
- A Calibration Tool for Refractive Underwater VisionFelix Seegräber, Mengkun She, Felix Woelk, Kevin Köser. 2087-2095 [doi]
- Sea-ing Through Scattered Rays: Revisiting the Image Formation Model for Realistic Underwater Image GenerationVasiliki Ismiroglou, Malte Pedersen, Stefan Hein Bengtson, Andreas Aakerberg, Thomas B. Moeslund. 2096-2105 [doi]
- Uncovering Anomalous Events for Marine Environmental Monitoring via Visual Anomaly DetectionLaura Weihl, Nejc Novak, Stefan Hein Bengtson, Malte Pedersen. 2106-2115 [doi]
- Refining Naive Annotations with Limited Expert Guidance for Semantic Segmentation: A Case Study on Underwater EchogramsMelissa Cote, Amanda Dash, Julek Chawarski, Alexandra Branzan Albu, Femina Senjaliya, Andrea Niemi. 2116-2125 [doi]
- DebrisVision: Bridging the Synthetic-to-Real Gap for Enhanced Underwater Debris AnalysisSivaji Retta, Sai Manikanta Eswar Machara, Iyyakutti Iyappan Ganapathi, Divya Velavudhan, Naoufel Werghi. 2126-2135 [doi]
- The Coralscapes Dataset: Semantic Scene Understanding in Coral ReefsJonathan Sauder, Viktor Domazetoski, Guilhem Banc-Prandi, Gabriela Perna, Anders Meibom, Devis Tuia. 2136-2143 [doi]
- Weakly Supervised MaxN Estimation in Baited Remote Underwater VideosAmelia Sorrenti, Leonardo G. Russo, Sarinda Samarasinghe, Simone Palazzo, Concetto Spampinato. 2144-2152 [doi]
- A Multi-Purpose Tracking Framework for Salmon Welfare Monitoring in Challenging EnvironmentsEspen Uri Høgstedt, Christian Schellewald, Annette Stahl, Rudolf Mester. 2153-2162 [doi]
- UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed TomographyShravan Venkatraman, Pavan Kumar S, Rakesh Raj Madavan, Chandrakala S. 2163-2173 [doi]
- Benchmarking Out-of-Distribution Detection for Plankton Recognition: A Systematic Evaluation of Advanced Methods in Marine Ecological MonitoringYingzi Han, Jiakai He, Chuanlong Xie, Jianping Li. 2163-2173 [doi]
- Steller Sea Lion Counting Across Multiple Aerial CamerasMatthew Dawkins, Jon Crall, Katie Sweeney, Burlyn Birkemeier, Neal Siekierski, Dawei Du. 2174-2182 [doi]
- KAMERA: Enhancing Aerial Surveys of Ice-Associated Seals in Arctic EnvironmentsAdam Romlein, Benjamin X. Hou, Yuval Boss, Cynthia L. Christman, Stacie Koslovsky, Erin E. Moreland, Jason Parham, Anthony Hoogs. 2183-2192 [doi]
- CoralVOS: Dataset and Benchmark for Dense Coral Video SegmentationZiqiang Zheng, Haixin Liang, Xie Yaofeng, Zhibin Yu 0002, Sai Kit Yeung. 2193-2203 [doi]
- Task-Driven Neural Adaptive Gain Control: 16-bit-to-8-bit Thermal Tone Mapping for Superior Object DetectionHossein Javidnia. 2204-2212 [doi]
- Towards a Generalizable Fusion Architecture for Multimodal Object DetectionJad Berjawi, Yoann Dupas, Christophe Cérin. 2213-2221 [doi]
- TY-RIST: Tactical YOLO Tricks for Real-Time Infrared Small Target DetectionAbdulkarim Atrash, Seyda Ertekin, Ömür Ugur, Omar Moured, Yufan Chen, Jiaming Zhang 0001. 2222-2231 [doi]
- Would SWIR Modality Help for Detection and Segmentation in Harsh Weather Conditions? An Experimental StudyRohan Mehra, Alexandre Riffard, Mathieu Labussière, Pierre Duthon, Romuald Aufrère. 2232-2240 [doi]
- IVIFormer: Illumination-Aware Infrared-Visible Image Fusion via Adaptive Domain-Switching Cross AttentionIncheol Park, Youngwan Jin, Yagiz Nalçakan, Hyeongjin Ju, Sanghyeop Yeo, Shiho Kim. 2241-2250 [doi]
- Video Clinical Outcome Assessments (vCOAs): A Scalable Workflow for Remote TrialsRabia Aziza, Clare Matthews, Elisa Ferrer, Fereshteh Poursaeed, Alejandro Mendoza Garcia, Elin Haf Davies. 2251-2255 [doi]
- Positioning a Video to Mesh Pipeline for 3D Foot Reconstruction as a Monitoring Tool for Peripheral EdemaWei Yan Peh, Hui Zhang, Jian Yang. 2256-2263 [doi]
- Magnol.AI Copilot: Multimodal LLMs for Natural, Accessible Data InteractionGuangchen Ruan, Hui Zhang, Hui Xiao, Jian Yang. 2264-2272 [doi]
- Patient-Centric Statistical Multi-Modal Fusion for Medical Diagnosis: Integrating DICOM, Radiomics, and Patient AttributesSeo-Yeon Choi, Kyungsu Lee. 2273-2284 [doi]
- Multi-Modal Deep Clustering Survival Machines for Alzheimer's Disease Subtype DiscoveryZixuan Wen, Bojian Hou, Weiqing He, Shu Yang 0009, Jason H. Moore, Andrew J. Saykin, Heng Huang 0001, Paul M. Thompson, Marylyn D. Ritchie, Christos Davatzikos, Li Shen 0001. 2285-2293 [doi]
- Adapting Vision-Language Models for 3D CT/MRI Understanding on PMBB via Slice Selection and Explanation AnalysisHongzhuo Chen, Rahul Shukla, Ruiming Wu, Shu Yang, Duy Duong Tran, Duy Minh Ho Nguyen, Mathias Niepert, Cameron Beeche, James C. Gee, Jeffrey T. Duda, Rakesh Sharma, Christos Davatzikos, Walter R. Witschey, Bojian Hou, Li Shen 0001. 2294-2303 [doi]
- Privacy Preservation Using Superimposed 3D-Models for Self-Supervised Training in Action RecognitionAsfandyar Azhar, Nidhish Shah, Shaurjya Mandal, Yongjie Jessica Zhang. 2304-2314 [doi]
- Pulling Back the Curtain: Unsupervised Adversarial Detection via Contrastive Auxiliary NetworksEylon Mizrahi, Raz Lapid, Moshe Sipper. 2315-2326 [doi]
- Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent ReconstructionJordan Vice, Naveed Akhtar, Mubarak Shah, Richard I. Hartley, Ajmal Mian. 2327-2337 [doi]
- On the Importance of Conditioning for Privacy-Preserving Data AugmentationJulian Lorenz, Katja Ludwig, Valentin Haug, Rainer Lienhart. 2338-2347 [doi]
- FrEVL: Leveraging Frozen Pretrained Embeddings for Efficient Vision-Language UnderstandingEmmanuelle Bourigault, Pauline Bourigault. 2348-2357 [doi]
- Recommendation by Generation: Generation Augmented Complementary Fashion Item Retrieval Using Incomplete OutfitGaurab Bhattacharya, Vivek B. S., P. Rajith Bhargav, Jayavardhana Gubbi, Bagya Lakshmi V, Arpan Pal 0001. 2358-2367 [doi]
- HyperVLM: Hyperbolic Space Guided Vision Language Modeling for Hierarchical Multi-Modal UnderstandingSarthak Srivastava, Kathy Wu. 2368-2379 [doi]
- SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product RecommendationSarthak Srivastava, Kathy Wu. 2380-2389 [doi]
- Sari Sandbox: A Virtual Retail Store Environment for Embodied AI AgentsJanika Deborah Gajo, Gerarld Paul Merales, Jerome Escarcha, Brenden Ashley Molina, Gian Nartea, Emmanuel G. Maminta, Juan C. Roldán, Rowel O. Atienza. 2390-2399 [doi]
- RetailAction: Dataset for Multi-View Spatio-Temporal Localization of Human-Object Interactions in RetailDavide Mazzini, Alberto Raimondi, Bruno Abbate, Daniel Fischetti, David M. Woollard. 2400-2408 [doi]
- Relative Pose Regression with Pose Auto-Encoders: Enhancing Accuracy and Data Efficiency for Retail ApplicationsYoli Shavit, Yosi Keller. 2409-2417 [doi]
- DualFit: A Two-Stage Virtual Try-On via Warping and SynthesisMinh Tran, Johnmark Clements, Annie Prasanna, Tri Nguyen 0005, Ngan Le. 2418-2428 [doi]
- Lessons and Winning Solutions in Industrial Object Detection and Pose Estimation from the 2025 Bin-Picking Perception ChallengeZiqin Huang, Chengxi Li, Yingyue Li, Xingyu Liu, Chenyangguang Zhang, Ruida Zhang, Bowen Fu, Xinggang Hu, Yun Qu, Mengge Liu, Yixiu Mao, Wendong Huang, Gu Wang 0001, Xiangyang Ji. 2429-2435 [doi]
- Learning Point Cloud Representations with Pose Continuity for Depth-Based Category-Level 6D Object Pose EstimationZhujun Li, Shuo Zhang, Ioannis Stamos. 2436-2446 [doi]
- MR6D: Benchmarking 6D Pose Estimation for Mobile RobotsAnas Gouda, Shrutarv Awasthi, Christian Blesing, Lokeshwaran Manohar, Frank Hoffmann, Alice Kirchheim. 2447-2455 [doi]
- Design Practices and Lessons from Deploying On-device Vision-Language Interaction in Robotic Guide DogsJinse Kwon, Jemin Lee, Yongin Kwon. 2456-2465 [doi]
- Understanding Touch Through Latent Spaces: Can Images and Haptic Maps Reflect Human Perception?Antonio Luigi Stefani, Sara Baldoni, Niccolò Bisagno, Federica Battisti, Nicola Conci, Francesco G. B. De Natale. 2466-2475 [doi]
- LLMs as NAO Robot 3D Motion PlannersRiccardo Catalini, Giacomo Salici, Federico Biagi, Guido Borghi, Luigi Biagiotti, Roberto Vezzani. 2476-2486 [doi]
- CrossCompanion: An Empathetic Real-Time Assistant Supporting Street Crossing for Low-Vision UsersJiazhao Liang, Yi Fang. 2487-2496 [doi]
- TriPlanNet: Triangle Path Planning Network for A Variable Truss Robot with Deep LearningChoonghan Lee, Leah Harris, SeHyun Oh, JooHyoung Cha, Jemin Lee, Yongin Kwon, Andrew Jang-ho Bae. 2497-2506 [doi]
- Flow-Guided Policies: Overcoming Diffusion Limitations for Robust Robot Imitation LearningChanhyuk Jung, Sangwon Kim, Kwang-Ju Kim, Dasom Ahn, Joonki Baek, Sungkeun Yoo, Byoung Chul Ko. 2507-2512 [doi]
- Towards Proactive Social Robots: Distilling Visual Knowledge from Large Vision-Language ModelsGiuseppe De Simone, Loris Roveda, Alessia Saggese, Mario Vento. 2513-2523 [doi]
- Visual Language Model-Based Food Safety Support for Persons with Blindness and Low VisionRyan Hansen, Hardik Setia, Giles Hamilton-Fletcher, Aryan Jain, Zirui Liu, Mariam Zoair, Reem Aboutaleb, Qing Wen, Yu Li, John-Ross Rizzo. 2524-2533 [doi]
- Towards Effective Human-in-the-Loop Assistive AI AgentsFilippos Bellos, Yayuan Li, Cary Shu, Ruey Day, Jeffrey Mark Siskind, Jason J. Corso. 2534-2543 [doi]
- NaVIP: A Low-Cost, Infrastructure-Free Indoor Navigation Solution for Visually Impaired PersonsJun Yu, Yitian Yang, Vinod Namboodiri. 2544-2553 [doi]
- Genμ: The Generative Machine Unlearning ChallengeKartik Thakral, Shreyansh Pathak, Tamar Glaser, Tal Hassner, Diego Garcia-Olano, Iacopo Masi, Richa Singh 0001, Mayank Vatsa. 2554-2562 [doi]
- QAQ: Quality Adaptive Quantization for LLM KV CacheWen Cheng, Shichen Dong, Jiayu Qin, Wei Wang. 2563-2571 [doi]
- Guardians of Generation: Dynamic Inference-Time Copyright Shielding with Adaptive Guidance for AI Image GenerationSoham Roy, Abhishek Mishra, Aakash Sen Sharma, Shirish S. Karande, Murari Mandal. 2572-2582 [doi]
- Segmentation by Merging Models Specialized to Each ClassKazuya Shibata, Kazuhiro Hotta. 2583-2591 [doi]
- Bias-Aware Machine Unlearning: Towards Fairer Vision Models via Controllable ForgettingSai Siddhartha Chary Aylapuram, Veeraraju Elluru, Shivang Agarwal. 2592-2600 [doi]
- Open-Ended 3D Point Cloud Instance SegmentationPhuc Nguyen, Minh Luu, Anh Tran 0001, Cuong Pham 0001, Khoi Nguyen 0001. 2601-2611 [doi]
- Sparse Multiview Open-Vocabulary 3D DetectionOlivier Moliner, Viktor Larsson, Kalle Åström. 2612-2621 [doi]
- Total-Editing: Head Avatar with Editable Appearance, Motion, and LightingYizhou Zhao, Chunjiang Liu, Haoyu Chen, Bhiksha Raj, Min Xu 0009, Tadas Baltrusaitis, Mitch Rundle, HsiangTao Wu, Kamran Ghasedi. 2622-2633 [doi]
- EVA-Gaussian: 3D Gaussian-Based Real-Time Human Novel View Synthesis Under Diverse Multi-View Camera SettingsYingdong Hu, Zhening Liu, Jiawei Shao, Zehong Lin, Jun Zhang 0004. 2634-2643 [doi]
- PU-Gaussian: Point Cloud Upsampling using 3D Gaussian RepresentationMahmoud Khater, Mona Strauss, Philipp von Olshausen, Alexander Reiterer. 2644-2652 [doi]
- CSG-Fusion: Consistent Sparse-View Gaussian Splatting via Matching-based FusionYan Xia, Wenbo Ji, Weirong Chen, Daniel Cremers. 2653-2662 [doi]
- Artist-Created Mesh Generation from Raw ObservationYao He, Youngjoong Kwon, Wenxiao Cai, Ehsan Adeli 0001. 2663-2668 [doi]
- DAPS-AGF: Depth-Aware Perceptual Similarity with Adaptive Gradient Filtering for Enhanced Outdoor Scene ReconstructionAqsa Yousaf, Arkajyoti Mitra, Paul Agbaje, Afia Anjum, Habeeb Olufowobi. 2669-2677 [doi]
- PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation ModelsQingdong He, Jinlong Peng, Zhengkai Jiang 0001, Xiaobin Hu, Jiangning Zhang. 2678-2688 [doi]
- DeepCollide: Scalable Data-Driven High DoF Configuration Space Modeling Using Implicit Neural RepresentationsGabriel Guo, Judah Goldfeder, Aniv Ray, Tony Dear, Hod Lipson. 2689-2699 [doi]
- Learning Robust Aligned Representations Across Multiple Visual Modalities in Human Action RecognitionDavid J. Lerch, Bastian Rothenburger, Zeyun Zhong, Manuel Martin, Frederik Diederichs, Rainer Stiefelhagen. 2700-2710 [doi]
- EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural TextsYuto Haneji, Taichi Nishimura, Hirotaka Kameko, Keisuke Shirai, Tomoya Yoshida, Keiya Kajimura, Koki Yamamoto, Taiyu Cui, Tomohiro Nishimoto, Shinsuke Mori. 2711-2721 [doi]
- EHWGesture - A Dataset for Multimodal Understanding of Clinical GesturesGianluca Amprimo, Alberto Ancilotto, Alessandro Savino, Fabio Quazzolo, Claudia Ferraris, Gabriella Olmo, Elisabetta Farella, Stefano Di Carlo. 2722-2731 [doi]
- EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene GraphsIvan Rodin, Tz-Ying Wu, Kyle Min 0001, Sharath Nittur Sridhar, Antonino Furnari, Subarna Tripathi, Giovanni Maria Farinella. 2732-2737 [doi]
- CricTAL: Introducing Temporal Activity Localisation Using Pose Estimation to Identify Key Phases in Cricket Batting for Downstream Action Quality AssessmentTevin Moodley, Dustin van der Haar. 2738-2746 [doi]
- InstaPose: Scene-Aware Pose Recommendation via Vision Transformers and Diversity-Optimized RerankingYu Ji, I-Han Hsiao. 2747-2753 [doi]
- Assessing the Quality of Soccer Shots from Single-Camera Video with Vision-Language Models and Motion FeaturesFilip Noworolnik, Joanna Jaworek-Korjakowska. 2754-2761 [doi]
- Forecasting and Visualizing Air Quality from Sky Images with Vision-Language ModelsMohammad Saleh Vahdatpour, Maryam Eyvazi, Yan-Qing Zhang 0001. 2762-2770 [doi]
- PerFairX: Is There a Balance Between Fairness and Personality in Large Language Model Recommendations?Chandan Kumar Sah. 2771-2780 [doi]
- MCL for MLLMs: Benchmarking Forgetting in Task-Incremental Multimodal LearningZichao Li. 2781-2787 [doi]
- DROID-Splat Combining End-to-End SLAM with 3D Gaussian SplattingChristian Homeyer, Leon Begiristain, Christoph Schnörr. 2788-2798 [doi]
- Towards High-Resolution Alignment and Super-Resolution of Multi-Sensor Satellite ImageryPhilip Wootaek Shin, Vishal Gaur, Rahul Ramachandran, Manil Maskey, Jack Sampson, Vijaykrishnan Narayanan, Sujit Roy. 2799-2807 [doi]
- GasTwinFormer: A Hybrid Vision Transformer for Livestock Methane Emission Segmentation and Dietary Classification in Optical Gas ImagingToqi Tahamid Sarker, Mohamed G. Embaby, Taminul Islam, Amer AbuGhazaleh, Khaled Ahmed. 2808-2817 [doi]
- ViewDelta: Scaling Scene Change Detection Through Text-ConditioningSubin Varghese, Joshua Gao, Vedhus Hoskere. 2818-2828 [doi]
- Tree Mapping with Limited Data: Fine-Tuning Foundation Models for Multimodal FusionXiaoyan Lu, Qihao Weng. 2829-2834 [doi]
- ViT-Koop: Vision-Transformer-Koopman Operators for Efficient Time-Series Forecasting of Earth-Observation DataTakayuki Shinohara, Hidetaka Saomoto. 2835-2844 [doi]
- Resolution Revolution: A Physics-Guided Deep Learning Framework for Spatiotemporal Temperature ReconstructionShengjie Liu, Lu Zhang, Siqin Wang. 2845-2854 [doi]
- PyroFocus: A Deep Learning Approach to Real-Time Wildfire Detection in Multispectral Remote Sensing ImageryMark Moussa, Andre Williams, Seth Roffe, Douglas Morton. 2855-2864 [doi]
- LEGNet: A Lightweight Edge-Gaussian Network for Low-Quality Remote Sensing Image Object DetectionWei Lu, Sibao Chen 0001, Hui-Dong Li, Qing-Ling Shu, Chris H. Q. Ding, Jin Tang 0001, Bin Luo 0001. 2865-2874 [doi]
- Inland Excess Water (IEW) Monitoring Using Sentinel-1/2: A SplitClass Segmentation and Temporal Gap-Filling ApproachYahya Ibrahim, Márta Belényesi, Chang Liu, Mátyás Richter-Cserey, Máté Simon, Tamás Szirányi, Csaba Benedek. 2875-2885 [doi]
- Towards Large Scale Geostatistical Methane Monitoring with Part-Based Object DetectionAdhemar de Senneville, Xavier Bou, Jean Louis Bonne, Nicolas Dumelié, Rafael Grompone von Gioi, Thibaud Ehret, Gabriele Facciolo. 2886-2896 [doi]
- IRLTrees3D: A 3D Reconstruction Dataset of TreesJoseph Chai, Barry O'Sullivan, Hoang D. Nguyen. 2897-2902 [doi]
- Evaluating Performance of Reinforcement Learning Agents to Control Buildings EfficientlyJudah Goldfeder, Gabriel Guerra Trigo, Philippe Martin Wyder, Neil Kachappilly, Hod Lipson. 2903-2909 [doi]
- HA-RDet: Hybrid Anchor Rotation Detector for Oriented Object DetectionPhuc Nguyen. 2910-2919 [doi]
- Multi-Scale Hybrid CNN-Transformer for Smoke Detection in Satellite ImagesTony Zhang, Robert P. Dick. 2920-2926 [doi]
- Context-Aware Masking and Learnable Diffusion-Guided Patch Refinement in Transformers via Sparse Supervision for Hyperspectral Image ClassificationAbhiroop Chatterjee, Susmita Ghosh, Ashish Ghosh. 2927-2936 [doi]
- Combining Transformers and CNNs for Efficient Object Detection in High-Resolution Satellite ImageryNicolas Drapier, Aladine Chetouani, Aurélien Chateigner. 2937-2946 [doi]
- AQUAH: Automatic Quantification and Unified Agent in HydrologySongkun Yan, Zhi Li 0066, Siyu Zhu 0002, Yixin Wen, Mofan Zhang, Mengye Chen, Jie Cao 0010, Yang Hong 0001. 2947-2956 [doi]
- BiodiverseNet: Multitask Learning on Fused Multispectral and Radar Data for Scalable Ecosystem MonitoringPrasanth Yadla. 2957-2964 [doi]
- Plantation Bench: A Multiscale, Multimodal Remote Sensing Benchmark for Plantation Mapping Under Distribution ShiftAngela Tsao, David B. Lobell. 2965-2975 [doi]
- Watch, Listen, Understand, Mislead: Tri-Modal Adversarial Attacks on Short Videos for Content Appropriateness EvaluationSahid Hossain Mustakim, S. M. Jishanul Islam, Ummay Maria Muna, Montasir Chowdhury, Mohammed Jawwadul Islam, Sadia Ahmmed, Tashfia Sikder, Syed Tasdid Azam Dhrubo, Swakkhar Shatabda. 2976-2985 [doi]
- Hashtag2Action: Data Engineering and Self-Supervised Pre-Training for Action Recognition in Short-Form VideosYang Qian, Ali Kargarandehkordi, YiNan Sun, Parnian Azizian, Onur Cezmi Mutlu, Saimourya Surabhi, Zain Jabbar, Dennis Paul Wall, Peter Washington 0001, Huaijin Chen. 2986-2996 [doi]
- End-to-End Action Segmentation TransformerTieqiao Wang, Sinisa Todorovic. 2997-3006 [doi]
- Difformer for Action SegmentationNicolas Aziere, Tieqiao Wang, Sinisa Todorovic. 3007-3016 [doi]
- Low-Bit FlashAttention Accelerated Operator Design Based on TritonJinyang Du, Jinyang Guo, Yifu Ding. 3017-3026 [doi]
- Pruning by Block Benefit: Exploring the Properties of Vision Transformer Blocks During Domain AdaptationPatrick Glandorf, Bodo Rosenhahn. 3027-3037 [doi]
- Kernel-Based Motion Free B-Frame Coding for Neural Video CompressionThang V. Nguyen. 3038-3046 [doi]
- Efficient Depth- and Spatially-Varying Image Simulation for Defocus DeblurXinge Yang, Chuong Nguyen, Wenbin Wang, Kaizhang Kang, Wolfgang Heidrich, Xiaoxing Li. 3047-3057 [doi]
- VCMamba: Bridging Convolutions with Multi-Directional Mamba for Efficient Visual RepresentationMustafa Munir, Alex Zhang, Radu Marculescu. 3058-3067 [doi]
- Leveraging Learned Image Prior for 3D Gaussian CompressionSeungjoo Shin, Jaesik Park, Sunghyun Cho. 3068-3077 [doi]
- L-GGSC: Learnable Graph-Based Gaussian Splatting CompressionAkihiro Kuwabara, Hinata Kirihara, Sorachi Kato, Toshiaki Koike-Akino, Takuya Fujihashi. 3078-3084 [doi]
- Fisheye Image Augmentation for Overcoming Domain Gaps with the Limited DatasetHyeseong Lee, Sunmin Park, Miyoung Lee. 3085-3093 [doi]
- From Binary to Semantic: Utilizing Large-Scale Binary Occupancy Data for 3D Semantic Occupancy PredictionChihiro Noguchi, Takaki Yamamoto. 3094-3103 [doi]
- Relevance-Guided Activation Sparsification for Bandwidth-Efficient Collaborative InferenceMaximilian Andreas Hoefler, Daniel Becking, Karsten Müller 0001, Detlev Marpe, Wojciech Samek. 3104-3113 [doi]
- Tiny-vGamba: Distilling Large Vision-(Language) Knowledge from CLIP into a Lightweight vGamba NetworkYunusa Haruna, Adamu Lawan, Ismail Abdulrashid, Abdulganiyu Abdu Yusuf. 3114-3122 [doi]
- Your Super Resolution Model is not Enough for Tackling Real-World ScenariosDongsik Yoon, JongEun Kim. 3123-3129 [doi]
- Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and PhysicsAlex Colagrande, Paul Caillon, Eva Feillet, Alexandre Allauzen. 3130-3139 [doi]
- Adaptive Compression of Large Vision Models for Efficient Image Quality Assessment of AI-Generated ContentShivam Bhardwaj, Tushar Shinde. 3140-3147 [doi]
- Decay Pruning Method: Smooth Pruning with a Self-Rectifying ProcedureMinghao Yang, Pengyuan Li 0001, Linlin Gao, Zhiying Cui, Wenbo Li, Jiangjian Xiao. 3148-3154 [doi]
- Latent Representation of Microstructures Using Variational Autoencoders with Spatial Statistics-Space LossAndy Cai, Sayed Sajad Hashemi, Noah H. Paulson, Michael Guerzhoy. 3155-3161 [doi]
- FDAL: Leveraging Feature Distillation for Efficient and Task-Aware Active LearningRebati Raman Gaire, Arman Roohi. 3162-3169 [doi]
- From Coarse to Fine: Learnable Discrete Wavelet Transforms for Efficient 3D Gaussian SplattingHung Nguyen, An Le, Blark Runfa Li, Truong Nguyen. 3170-3179 [doi]
- Compressed Diffusion: Pruning with Knowledge Distillation for Efficient Text-to-Image GenerationNasrin Kalanat, Ohiremen Dibua, Yan Kang, Yifan Gong 0012, Xiaowei Jia. 3180-3188 [doi]
- ECLR'25: 2nd Workshop on Efficient Computing Under Limited Resources: Visual ComputingJinyang Guo, Zhenghao Chen, Yuqing Ma, Yifu Ding, Xianglong Liu 0001, Jinman Kim, Wanli Ouyang, Dacheng Tao. 3189-3192 [doi]
- MAYA: Multi-Attack Yielding Augmentation for Unified Face Attack DetectionTaehoon Kim, Jongwook Choi, Seungjin Jung, Jongwon Choi. 3193-3201 [doi]
- UniAttackData+: Unified Physical-Digital Attack Detection+ ChallengeHui Ma 0018, Ajian Liu 0001, Yongze Li, Chuanbiao Song, Xiao Guo, Changtao Miao, Wanyi Zhuang, Junze Zheng, Shunxin Chen, Yan Hong 0001, Jiabao Guo, Jiankang deng, Jun Lan, Weiqiang Wang 0002, Tao Gong, Qi Chu 0001, Sergio Escalera, Hugo Jair Escalante, Xiaoming Liu 0002, Zhen Lei 0001, Isabelle Guyon, Yanyan Liang 0001, Jun Wan 0001. 3202-3211 [doi]
- Oculus: Hierarchical Face Spoof Detection via Frequency-Enhanced Vision Transformers with Group-Aware Classification and Post-Fusion Attention ICCV ProceedingsVincent Jacob Whannou de Dravo, Kiran Raja, Mohammed Bouzidi, Limamou Gueye, Jon Yngve Hardeberg. 3212-3220 [doi]
- A Generalizable Face Security Detection Method via Unified Texture and Semantic Feature FrameworkKe-Yue Zhang, Ruoxin Chen, Jiamu Sun, Jiangming Wang, Hao Yang, Taiping Yao, Shouhong Ding. 3221-3229 [doi]
- Applying Semantic Anchor in Face Anti-Spoofing Detection for Unified Physical-Digital AttacksXu Yang, Qi Zhang, Yaowen Xu, Hui Ma, Zhaofan Zou, Hao Sun. 3230-3238 [doi]
- CrossGAP: Unified Face Anti-Spoofing via Cross-Modal Global-Aware PromptingXihong Chen, Yutong Xie, Hui Ma 0018, Ning Li, Yanyan Liang 0001, Jiabao Guo. 3239-3246 [doi]
- Paired-Sampling Contrastive Framework for Joint Physical-Digital Face Attack DetectionAndrei Balykin, Anvar Ganiev, Denis Kondranin, Kirill Polevoda, Nikolai Liudkevich, Artem Petrov. 3247-3254 [doi]
- Adaptive Face Anti-Spoofing Through Enhanced Data Manipulation and Feature Contrast TechniquesJielong Wang, Kaifeng Huang, Zehua Lan, Senxu He, Minzhe Huang, Ruvi Cai. 3255-3261 [doi]
- Center-Guided Feature Selection with Dimensionality Reduction for Face Anti-SpoofingSenxu He, Minzhe Huang, Ruyi Cai, Jielong Wang, Kaifeng Huang, Zehua Lan. 3262-3269 [doi]
- Iterative Binary TrainingVitor da Silva, Rosana Tomás, Fernando Cores, Francesc Giné. 3270-3278 [doi]
- One Model, All Attacks: ACER-Optimized Vision Transformer for Robust Face Anti-SpoofingJie Jin, Mahiro Tokumasu, Yu Makino, Shumpei Nishikawa, Tetsushi Ohki. 3279-3286 [doi]
- Optimizing DINOv2 with Registers for Face Anti-SpoofingMika Feng, Pierre Gallin-Martel, Koichi Ito 0001, Takafumi Aoki. 3287-3293 [doi]
- Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack DetectionJiabao Guo, Yadian Wang, Hui Ma 0018, Yuhao Fu, Ju Jia, Hui Liu, Shengeng Tang, Lechao Cheng, Yufeng Diao, Ajian Liu 0001. 3294-3302 [doi]
- TSANet: Unified Face Attack Detection via Triplet Embedding and Soft ACER LossGabrial Zencha Ashungafac, Miracle James Oloutuche, Mukar Wepngong Eleanor, Blessed Guda, Assane Gueye, Moise Busogi. 3303-3310 [doi]
- TRICKY 2025 Challenge on Monocular Depth from Images of Specular and Transparent SurfacesPierluigi Zama Ramirez, Alex Costanzino, Fabio Tosi, Matteo Poggi, Luigi di Stefano, Jean-Baptiste Weibel, Doris Antensteiner, Markus Vincze, Benjamin Busam, Guangyao Zhai, Weihang Li, Junwen Huang, Hyunjun Jung, Mykola Lavreniuk, Pihai Sun, Yijun Luo, Hongtao Wang, Manying Gao, Kui Jiang, Junjun Jiang. 3311-3322 [doi]
- TRICKY 2025 HouseCat6D Object Pose Estimation Challenge with Specular and Transparent SurfacesWeihang Li, Junwen Huang, Hyunjun Jung, Guangyao Zhai, Pierluigi Zama Ramirez, Alex Costanzino, Fabio Tosi, Matteo Poggi, Luigi di Stefano, Jean-Baptiste Weibel, Doris Antensteiner, Markus Vincze, Jing He, Yiqing Wang, Kexin Zhang 0003, Licheng Jiao, Lingling Li 0002, Fang Liu 0001, Wenping Ma 0001, Benjamin Busam. 3323-3333 [doi]
- Better Supervised Fine-tuning for VQA: Integer-Only LossBaihong Qian, Haotian Fan, Wenjie Liao, Yunqiu Wang, Tao Li, Junhui Cui. 3334-3343 [doi]
- VQualA 2025 Document Image Quality Assessment ChallengeFan Huang, Xiongkuo Min, Zhichao Ma, Xiaohong Liu, Chris Wei Zhou, Guangtao Zhai, Junjie Gao, Runze Liu, Yingzhe Peng, Shujian Yang, Jin Zhang, Kai Yang, Zhiyuan You, Michael Ao, Yicheng Wu, Weixia Zhang, JunLin Chen, Wei Sun, Zhihua Wang, Zhe Zhang, Yang Yang, Mingying Bai, Jiawang Du, Zilong Lu, Zhenyu Jiang, Ziguan Cui, Zongliang Gan, Guijin Tang, Fan Yang, Hang Ouyang, Zhuohang Shi, Tianxin Xiao, Zhizun Luo, Zhaowang Wu, Kaixin Deng, Ruikun Zhang, Hao Yang, Liyuan Pan. 3344-3353 [doi]
- Hybrid Vision Transformer and Convolutional Neural Network for Super-Resolution Image Quality AssessmentXinyu Li, Chuanbiao Song, Chenqi Zhang, Jun Lan, Huijia Zhu, Weiqiang Wang 0002, Xiaoyan Sun. 3354-3361 [doi]
- PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated ImagesJiquan Yuan, Jihe Li, Fanyi Yang, Xinyan Cao, Jinming Che, Jinlong Lin, Xixin Cao. 3362-3371 [doi]
- VQualA 2025 Challenge on Image Super-Resolution Generated Content Quality Assessment: Methods and ResultsYixiao Li, Xin Li, Chris Wei Zhou, Shuo Xing, Hadi Amirpour, Xiaoshuai Hao, Guanghui Yue 0001, Baoquan Zhao, Weide Liu, Xiaoyuan Yang, Zhengzhong Tu, Xinyu Li, Chuanbiao Song, Chenqi Zhang, Jun Lan, Huijia Zhu, Weiqiang Wang 0002, Xiaoyan Sun, Shishun Tian, Dongyang Yan, Weixia Zhang, JunLin Chen, Wei Sun, Zhihua Wang, Zhuohang Shi, Zhizun Luo, Hang Ouyang, Tianxin Xiao, Fan Yang, Zhaowang Wu, Kaixin Deng. 3372-3382 [doi]
- VQualA 2025 Challenge on Visual Quality Comparison for Large Multimodal Models: Methods and ResultsHanwei Zhu, Haoning Wu, Zicheng Zhang, Lingyu Zhu 0006, Yixuan Li, Peilin Chen 0001, Shiqi Wang 0001, Chris Wei Zhou, Linhan Cao, Wei Sun 0029, Xiangyang Zhu, Weixia Zhang, Yucheng Zhu, Jing Liu, Dandan Zhu 0001, Guangtao Zhai, Xiongkuo Min, Zhichao Zhang, Xinyue Li, Shubo Xu, Anh Dao, Yifan Li, Hongyuan Yu, Jiaojiao Yi, Yiding Tian, Yupeng Wu, Feiran Sun, Lijuan Liao, Song Jiang. 3383-3393 [doi]
- Efficient Face Image Quality Assessment via Self-Training and Knowledge DistillationWei Sun 0029, Weixia Zhang, Linhan Cao, Jun Jia, Xiangyang Zhu, Dandan Zhu 0001, Xiongkuo Min, Guangtao Zhai. 3394-3402 [doi]
- Perceptual Classifiers: Detecting Generative Images Using Perceptual FeaturesKrishna Srikar Durbha, Asvin Kumar Venkataramanan, Rajesh Sureddi, Alan C. Bovik. 3403-3413 [doi]
- AIGVQA: A Unified Framework for Multi-Dimensional Quality Assessment of AI-Generated VideoJiarui Wang, Juntong Wang, Xiaorong Zhu, Huiyu Duan, Guangtao Zhai, Xiongkuo Min. 3414-3421 [doi]
- VQualA 2025 Challenge on GenAI-Bench AIGC Video Quality Assessment: Methods and ResultsYing Chen, Huasheng Wang, PengXiang Xiao, Yukang Ding, Enpeng Liu, Chris Wei Zhou, Baojun Li, Jiamian Huang, Jiarui Wang, Xiaorong Zhu, Juntong Wang, Huiyu Duan, Xiongkuo Min, Qiang Hu 0003, Chunlei Cai, Guangtao Zhai, Baihong Qian, Haotian Fan, Wenjie Liao, Yunqiu Wang, Tao Li, Junuhi Cui, Zhichao Zhang, Xinyue Li, Yunhao Li, Xiaohong Liu, Weixia Zhang, Bingkun Zheng, Wei Sun 0029, Zhihua Wang, Longwei Li, Jinyu Zhao, Xincheng Lv, Yang Cai, Fangfang Lu, Ritik Bompilwar, Saurabh Koshatwar, Weifeng Cai, Guangqian Kong, Junfeng Yang, Jing Fu 0005, Wei Zhang, Wenzhi Cao, Limei Liu, Qin Li, Wanli Ma, Yixiao Li, Xiaoshuai Hao. 3422-3431 [doi]
- VQualA 2025 Challenge on Engagement Prediction for Short Videos: Methods and ResultsDasong Li, Sizhuo Ma, Hang Hua, Wenjie Li, Jian Wang, Chris Wei Zhou, Fengbin Guan, Xin Li, Zihao Yu, Yiting Lu, Ru-Ling Liao, Yan Ye, Zhibo Chen 0001, Wei Sun 0029, Linhan Cao, Yuqin Cao, Weixia Zhang, Wen Wen 0007, Kaiwei Zhang, Zijian Chen 0001, Fangfang Lu, Xiongkuo Min, Guangtao Zhai, Erjia Xiao, LingFeng Zhang, Zhenjie Su, Hao Cheng, Yu Liu, Renjing Xu, Long Chen, Xiaoshuai Hao, Zhenpeng Zeng, Jianain Wu, Xuxu Wang, Oian Yu, Bo Hu, Weiwei Wang, Pinxin Liu, Yunlong Tang, Luchuan Song, Jinxi He, Jiaru Wu, Hanjia Lyu. 3432-3442 [doi]
- Engagement Prediction of Short Videos with Large Multimodal ModelsWei Sun 0029, Linhan Cao, Yuqin Cao, Weixia Zhang, Wen Wen 0007, Kai Zhang 0008, Zijian Chen 0001, Fangfang Lu, Xiongkuo Min, Guangtao Zhai. 3443-3452 [doi]
- Understanding Perceptual Quality in CCTV Images: A Benchmark Dataset and Entropy-Based InsightsYujin Han, Jiwoo Kang, Sanghoon Lee, Taewan Kim. 3453-3460 [doi]
- A Lightweight Ensemble-Based Face Image Quality Assessment Method with Correlation-Aware LossMohammadAli Hamidi, Hadi Amirpour, Luigi Atzori, Christian Timmerer. 3461-3468 [doi]
- DeQA-Doc: Adapting DeQA-Score to Document Image Quality AssessmentJunjie Gao, Runze Liu, Yingzhe Peng, Shujian Yang, Jin Zhang, Kai Yang, Zhiyuan You. 3469-3478 [doi]
- QualiVision: Multi-Modal Video Quality Assessment with Quality-Aware Fusion and Discriminative Learning StrategiesRitik Bompilwar, Saurabh Koshatwar. 3479-3488 [doi]
- VQualA 2025 Challenge on Face Image Quality Assessment: Methods and ResultsSizhuo Ma, Wei-Ting Chen, Qiang Gao, Jian Wang, Chris Wei Zhou, Wei Sun 0029, Weixia Zhang, Linhan Cao, Jun Jia, Xiangyang Zhu, Dandan Zhu 0001, Xiongkuo Min, Guangtao Zhai, Baoying Chen, Xiongwei Xiao, Jishen Zeng, Wei Wu 0002, Tiexuan Lou, Yuchen Tan, Chunyi Song, Zhiwei Xu, MohammadAli Hamidi, Hadi Amirpour, MingYin Bai, Jiawang Du, Zhenyu Jiang, Zilong Lu, Ziguan Cui, Zongliang Gan, Xinpeng Li, Shiqi Jiang, Chenhui Li, Changbo Wang, Weijun Yuan, Zhan Li, Yihang Chen, Yifan Deng, Ruting Deng, Zhanglu Chen, Boyang Yao, Shuling Zheng, Feng Zhang 0047, Zhiheng Fu, Abhishek Joshi, Aman Agarwal, Rakhil Immidisetti, Ajay Narasimha Mopidevi, Vishwajeet Shukla, Hao Yang, Ruikun Zhang, Liyuan Pan, Kaixin Deng, Hang Ouyang, Fan Yang, Zhizun Luo, Zhuohang Shi, Songning Lai, Weilin Ruan, Yutao Yue. 3489-3498 [doi]
- MMF-QE: Advanced Multi-Modal Fusion for Quality Assessment and Engagement Prediction in User-Generated Short VideosFengbin Guan, Zihao Yu, Yiting Lu, Wei Luo, Yixin Gao, Xin Li 0082, Zhibo Chen 0001. 3499-3506 [doi]
- RankCORE: Self-Supervised Ranking-Aware Correlation Optimized Regression for Face Image Quality AssessmentAbhishek Joshi, Aman Agarwal. 3507-3515 [doi]
- Exploring MLLM in Fine-Grained Visual Quality Comparison with Quality TokenSong Jiang, Lijuan Jiao, Yuguang Li, Pei-you, Hui Li, Ran Yang, Feng Zhu, Sehwan Ki, Hyong-Euk Lee. 3516-3526 [doi]
- Decoder-Aware Self-Supervised Continual Pretraining and Uncertainty-Guided Pseudo-Labeling for Wheat Organ SegmentationTapotosh Ghosh, Md Jaber Al Nahian, Farnaz Sheikhi, Farhad Maleki. 3516-3526 [doi]
- Self-Supervised Representation Learning with Diffusion-Based Refinement for Image RectificationPooja Kumari 0001, Sukhendu Das. 3527-3535 [doi]
- Efficient Additive Attention for Transformer-Based Semi-Supervised Document Layout AnalysisTahira Shehzadi, Ifza, Didier Stricker, Muhammad Zeshan Afzal. 3536-3544 [doi]
- Pose to Protect: Federated Skeleton-Based Anomaly Detection for Privacy-Conscious Video SurveillanceDina Famouri, Md. Zarif Hossain, Ahmed Imteaj. 3545-3553 [doi]
- MG-Gen: Single Image to Motion Graphics GenerationTakahiro Shirakawa, Tomoyuki Suzuki, Takuto Narumoto, Daichi Haraguchi. 3554-3563 [doi]
- Embedding Font Impression Word Tags Based on Co-occurrenceYugo Kubota, Seiichi Uchida. 3564-3573 [doi]
- A Comprehensive Library for Benchmarking Multi-Class Visual Anomaly DetectionJiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang 0001, Lei Xie 0007, Yong Liu 0032. 3574-3583 [doi]
- Bridge Feature Matching and Cross-Modal Alignment with Mutual-Filtering for Zero-Shot Anomaly DetectionYuhu Bai, Jiangning Zhang, Yunkang Cao, Guangyuan Lu, Qingdong He, Xiangtai Li, Guanzhong Tian. 3584-3593 [doi]
- DinoAtten3D: Slice-Level Attention Aggregation of DinoV2 for 3D Brain MRI Anomaly ClassificationFazle Rafsani, Jay Shah, Catherine D. Chong, Todd J. Schwedt, Teresa Wu. 3594-3603 [doi]
- Zero-Shot Image Anomaly Detection Using Generative Foundation ModelsLemar Abdi, M. M. Amaan Valiuddin, Francisco Caetano, Christiaan G. A. Viviers, Fons van der Sommen. 3604-3613 [doi]
- Automated Feature Tracking for Real-Time Kinematic Analysis and Shape Estimation of Carbon Nanotube GrowthKaveh Safavigerdini, Ramakrishna Surya, Jaired Collins, Prasad Calyam, Filiz Bunyak, Matthew R. Maschmann, Kannappan Palaniappan. 3614-3623 [doi]
- Neural Object Detection for 4D-STEM: High-Throughput Sub-Pixel Electron Diffraction Pattern RecognitionArda Genc, Ravit Silverstein. 3624-3634 [doi]
- MatSSL: Robust Self-Supervised Representation Learning for Metallographic Image SegmentationHoang-Hai-Nam Nguyen, Phan Nguyen Duc Hieu, Ho Won Lee. 3635-3642 [doi]
- Application of Semantic Segmentation on SEM Images of Rubber Composites and Verification of Correlation with Material PropertiesSho Sakamoto, Wakana Miyoshi, Kentaro Oishi, Ryuta Asami, Akira Kobayashi, Hiroki Iwana, Taichi Tokunaga, Yosuke Motohashi. 3643-3651 [doi]
- Unveiling Process-Structure Mapping with a Deep Variational AutoencoderAvanish Mishra, Brenden William Hamilton, Mashroor Shafat Nitol, Nithin Mathew, Kipton Barros, Timothy C. Germann, Saryu Fensin. 3652-3661 [doi]
- Improving Multislice Electron Ptychography with a Generative PriorChristian K. Belardi, Chia-Hao Lee, Yingheng Wang, Justin Lovelace, Kilian Q. Weinberger, David A. Muller, Carla P. Gomes. 3662-3672 [doi]
- SAM- and μSAM- Based Inference of Nuclear Materials Processing History from SEM ImageryMayolo Valencia Mendoza, Alexei N. Skurikhin, Judith D. Cohn, Luther McDonald, Kari Sentz. 3673-3680 [doi]
- Improving U-Net Confidence on TEM Image Data with L2-Regularization, Transfer Learning, and Deep Fine-TuningAiden Ochoa, Xinyuan Xu, Xing Wang. 3681-3689 [doi]
- AtomDiffuser: Time-Aware Degradation Modeling for Drift and Beam Damage in STEM ImagingHao Wang, Hongkui Zheng, Kai He, Abolfazl Razi. 3690-3699 [doi]
- AI-Driven Analysis of Fracture Types in Metallic Materials: Classification, Segmentation, and Quantification with Deep Neural NetworksAnna Wójcicka, Mateusz Wojtulewicz, Andrzej Brodzicki, Magdalena Leonkiewicz, Patryk Maciejewski, Joanna Jaworek-Korjakowska, Krzysztof Mroczka. 3700-3708 [doi]
- A Transformer-Based GAN Architecture for Simulating Carbon Nanotube Structures in the Frequency DomainTrevor Bohl, Minasadat Attari, Matthew R. Maschmann, Filiz Bunyak. 3709-3714 [doi]
- Foundation Versus Domain-Specific Models: Performance Comparison, Fusion, and Explainability in Face RecognitionRedwan Sony, Parisa Farmanifard, Arun Ross, Anil K. Jain 0001. 3715-3725 [doi]
- GaitCrafter: Diffusion Model for Biometric Preserving Gait SynthesisSirshapan Mitra, Yogesh S. Rawat. 3726-3735 [doi]
- FaceLLM: A Multimodal Large Language Model for Face UnderstandingHatef Otroshi-Shahreza, Sébastien Marcel. 3736-3746 [doi]
- A Multi-domain Image Translative Diffusion StyleGAN for Iris Presentation Attack DetectionShivangi Yadav, Arun Ross. 3747-3756 [doi]
- FLUXSynID: A Framework for Identity-Controlled Synthetic Face Generation with Document and Live ImagesRaul Ismayilov, Dzemila Sero, Luuk J. Spreeuwers. 3757-3767 [doi]
- Fusing Convolution and Vision Transformer Encoders for Object Height Estimation from Monocular Satellite and Aerial ImagesFurkan Gültekin, Alper Koz, Reza Bahmanyar, Seyed Majid Azimi, Mehmet Lütfi Süzen. 3768-3777 [doi]
- Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth ObservationCamille Billouard, Dawa Derksen, Alexandre Constantin, Bruno Vallet. 3778-3788 [doi]
- 3DGS-to-PC: 3D Gaussian Splatting to Dense Point CloudsLewis A G. Stuart, Andrew Morton, Ian Stavness, Michael P. Pound. 3789-3798 [doi]
- MIRAGE: Unsupervised Single Image to Novel View Generation with Cross Attention GuidanceLLukman Cerkezi, Aram Davtyan, Sepehr Sameni, Paolo Favaro. 3799-3809 [doi]
- Adapting Stereo Vision from Objects to 3D Lunar Surface Reconstruction with the StereoLunar DatasetClementine Grethen, Simone Gasparini, Géraldine Morin, Jérémy Lebreton, Lucas Marti, Manuel Sanchez-Gestido. 3810-3819 [doi]
- Re: Verse - Can Your VLM Read a Manga?Aaditya Baranwal, Madhav Kataria, Naitik Agarwal, Yogesh Singh Rawat, Shruti Vyas. 3820-3830 [doi]
- Generating Visually Consistent Images for Storytelling via Narrative Graph PromptingAndrew Shin, Kunitake Kaneko. 3831-3836 [doi]
- Generative AI for Cel-Animation: A SurveyYunlong Tang 0002, Junjia Guo, Pinxin Liu, Zhiyuan Wang, Hang Hua, Jia-Xing Zhong, Yunzhong Xiao, Chao Huang 0033, Luchuan Song, Susan Liang, Yizhi Song, Liu He, Jing Bi 0002, Mingqian Feng, Xinyang Li, Zeliang Zhang 0001, Chenliang Xu. 3837-3850 [doi]
- From Sound to Sight: Towards AI-authored Music VideosLeo Vitasovic, Stella Graßhof, Agnes Mercedes Kloft, Ville V. Lehtola, Martin Cunneen, Justyna Starostka, Glenn McGarry, Kun Li 0024, Sami S. Brandt. 3851-3861 [doi]
- TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object DetectionSuhwan Cho, Minhyeok Lee, Jungho Lee, Sunghun Yang, Sangyoun Lee. 3862-3872 [doi]
- Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object SegmentationSuhwan Cho, Seunghoon Lee, Minhyeok Lee, Jungho Lee, Sangyoun Lee. 3873-3883 [doi]
- DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object SegmentationSuhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Sangyoun Lee. 3884-3894 [doi]
- Local2Global Query Alignment for Video Instance SegmentationRajat Koner, Zhipeng Wang, Srinivas Parthasarathy, Chinghang Chen. 3895-3904 [doi]
- Beyond Appearance: Geometric Cues for Robust Video Instance SegmentationQuanzhu Niu, Yikang Zhou, Shihao Chen, Tao Zhang 0042, Shunping Ji. 3905-3915 [doi]
- MIPI 2025 Challenge on Detailed Image Quality Assessment: Methods and ResultsWenjie Liao, Haotian Fan, Yifang Xu, Meijia Song, Qiufang Ma, Shuhao Han, Chunle Guo, Chongyi Li, Jianhui Sun, Xinli Yue, Yuhao Xie, Tao Shao, Zhaoran Zhao, Xinjun Ma, Lu Liu 0005, Chunlei Cai, Qiang Hu 0003, Shaocheng Shen, Huiyu Duan, Tianxiao Ye, Xiaoyun Zhang, Hong Yi, Yupeng Zhang, Linnan Zhao, Xinyi You, Ziang Li, Chenhao Qiu, Alireza Talebpour, Azadeh Mansouri, Ahmad Mahmoudi Aznaveh, Hossein Motamednia. 3916-3923 [doi]
- MIPI 2025 Challenge on Deblurring for Hybrid EVS Camera: Methods and ResultsYaqi Wu, Zhihao Fan, Hirotaka Shinozaki, Frank Zhang, Xander Li, Alexis Baudron, Wenbin Feng, Shuang Zhao, Jin Han, Cheng Li, Yihui Shi, Dehua Song, Zheng Chen, Wenbo Li, Fenglong Song, Yihong Leng, Siming Zheng, Peng-Tao Jiang, Linxiao Shi, Jinwei Chen, Bo Li, Jiaojiao Li, Cheng Li, Jinao Song, Yan Chen, Yajing Wei, Yuqiang Yang, Jian Tang, Long Bao, Heng Sun, Yang Qiu, Kean Liu, Senyan Xu, Zhijing Sun, Jiaying Zhu, Chengjie Ge, Xingbo Wang, Yidi Liu, Xin Lu, Xueyang Fu, Zheng-Jun Zha, Qinglin Liu, Wei Yu, Yunfan Lu, Xiangming Wang, Shuqi Ren, Haijin Zeng, Jinchao Li, Shiyang Zhou, Yongyong Chen, Qichao Dong, Junqi Lu. 3924-3934 [doi]
- Monochromatic Event Guided Image Deblurring with Event-Triggering-Aware DecompositionMinggui Teng, Boyu Li, Yixin Yang 0008, Chu Zhou, Yan Chen, Jimmy S. Ren, Boxin Shi. 3935-3944 [doi]
- Learning Camera-Agnostic White-Balance PreferencesLuxi Zhao 0002, Mahmoud Afifi, Michael S. Brown. 3945-3954 [doi]
- Syncalize - Precise Automatic Timecodes for Video and Audio Devices Using Consumer HardwareOliver Zendel 0001, Matthias Schörghuber, Clemens Hofbauer. 3955-3964 [doi]
- AesCrop: Aesthetic-Driven Cropping Guided by CompositionYen-Hong Wong, Lai-Kuan Wong. 3965-3972 [doi]
- GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-ResolutionAditya Arora, Zhengzhong Tu, Yufei Wang, Ruizheng Bai, Jian Wang, Sizhuo Ma. 3973-3981 [doi]
- EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame InterpolationZiran Zhang, Xiaohui Li, Yihao Liu, Yujin Wang, Yueting Chen, Tianfan Xue, Shi Guo. 3982-3991 [doi]
- HiLLIE: Human-in-the-Loop Training for Low-Light Image EnhancementXiaorui Zhao, Xinyue Zhou, Peibei Cao, Junyu Lou, Shuhang Gu. 3992-4002 [doi]
- iDETEX: Empowering MLLMs for Intelligent DETailed EXplainable IQAZhaoran Zhao, Xinli Yue, Jianhui Sun, Yuhao Xie, Tao Shao, Liangchao Yao, Fan Xia, Yuetang Deng. 4003-4012 [doi]
- Predictive Quality Assessment for Mobile Secure GraphicsCas Steigstra, Sergey Milyaev, Shaodi You. 4013-4022 [doi]
- Diffusion Autoencoders are Foundation Video CompressorsNiccolò Niccoli, Leonardo Galteri, Lorenzo Seidenari. 4023-4031 [doi]
- Probabilistic Dynamic Quantization for Memory Constrained DevicesGabriele Santini, Francesco Paissan, Elisabetta Farella. 4032-4041 [doi]
- Extreme Compression of Adaptive Neural ImagesLeo Hoshikawa, Marcos V. Conde, Takeshi Ohashi 0002, Atsushi Irie. 4042-4052 [doi]
- HC-PTQ: Poincaré-Based Hyperbolic Clustering for Data-Free Quantization of Vision TransformersRaffaele Mineo, Simone Palazzo, Concetto Spampinato, Francesco Rundo. 4053-4060 [doi]
- Enhancing Generalization in Data-Free Quantization via Mixup-Class PromptingJiwoong Park, Chaeun Lee, Yongseok Choi, Sein Park, Deokki Hong, Jungwook Choi. 4061-4070 [doi]
- PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception TasksXinhao Wang, Zhiwei Lin, Zhongyu Xia, Yongtao Wang. 4071-4081 [doi]
- MoPEQ: Mixture of Mixed Precision Quantized ExpertsKrishna Teja Chitty-Venkata, Jie Ye, Murali Emani. 4082-4091 [doi]
- Exploiting Information Redundancy in Attention Maps for Extreme Quantization of Vision TransformersLucas Maisonnave, Karim Haroun, Tom Pégeot. 4092-4100 [doi]
- PREFILT: Prefiltering for Fully Quantized Image Restoration Neural NetworksDenis Makhov, Ruslan Ostapets, Irina Zhelavskaya, Dehua Song. 4101-4110 [doi]
- Binary SqueezeNet: Enhancing Parameter Efficiency in Binary Neural NetworksSalih Atabey, Erdem Akagündüz. 4111-4120 [doi]
- Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM BenchmarkGoeric Huybrechts, Srikanth Ronanki, Sai Muralidhar Jayanthi, Jack FitzGerald, Srinivasan Veeravanallur. 4121-4129 [doi]
- Refining Skewed Perceptions in Vision-Language Contrastive Models through Visual RepresentationsHaocheng Dai, Sarang C. Joshi. 4130-4139 [doi]
- Med-GRIM: Enhanced Zero-Shot Medical VQA Using Prompt-Embedded Multimodal Graph RAGRakesh Raj Madavan, Akshat Kaimal, Hashim Faisal, Chandrakala Shanmuganathan. 4140-4150 [doi]
- Chrono: A Simple Blueprint for Representing Time in MLLMsBoris Meinardus, Hector G. Rodriguez, Anil Batra, Anna Rohrbach, Marcus Rohrbach. 4151-4156 [doi]
- Towards Reporting Bias in Visual-Language Datasets: Bi-Modal Data Augmentation by Decoupling Object-Attribute AssociationQiyu Wu 0001, Mengjie Zhao, Yutong He, Lang Huang 0001, Junya Ono, Hiromi Wakaki, Yuki Mitsufuji. 4157-4166 [doi]
- Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-RankingDror Aiger, Bingyi Cao, Kaifeng Chen, André Araújo 0001. 4167-4176 [doi]
- Smart Routing for Multimodal Video Retrieval: When to Search WhatKevin Dela Rosa. 4177-4185 [doi]
- IRR-LMM: Improving On-Demand Retail Recommendation with Large Multi-Modal ModelsYihao Zhao, Nan Lai, Xiaoming Li, Xu Yan, Wenhao Deng, Hujiang Huang, Shuai Zhang, Wei Lin. 4186-4195 [doi]
- Visual Adaptive Prompting for Compositional Zero-Shot LearningKyle Stein, Andrew Arash Mahyari, Guillermo Francia, Eman El-Sheikh. 4196-4205 [doi]
- Rate-Distortion Limits for Multimodal Retrieval: Theory, Optimal Codes, and Finite-Sample GuaranteesThomas Y. Chen. 4206-4215 [doi]
- MIND-RAG: Multimodal Context-Aware and Intent-Aware Retrieval-Augmented Generation for Educational PublicationsJiayang Yu, Yuxi Xie, Guixuan Zhang, Jie Liu 0028, Zhi Zeng, Ying Huang, Shuwu Zhang. 4216-4223 [doi]
- Mitigating Language Confusion for Multimodal Foundation Models via Confusion-Aware Preference Optimization PipelineSeunghyun Hwang, Sungjun Lim, Soyeon Shin, Hyun-Geun Kim, Jungwon Lim, Juncheol Kim, Byungseok Kang, Daewoo Myoung. 4224-4233 [doi]
- Enhancing Circuit Diagram Understanding via Near Sight Correction Using VLMsShreyas Kulkarni, Vivek Kumar, Remish Leonard Minz, Munender Varshney, Thiruvengadam Samon, Abhishek Mitra, Nikhil Kulkarni, Nilanjan Chakravortty, Prateek Mital, Kingshuk Banerjee. 4234-4242 [doi]
- Uncertainty-Aware ControlNet: Bridging Domain Gaps with Synthetic Image GenerationJoshua Niemeijer, Jan Ehrhardt, Heinz Handels, Hristina Uzunova. 4243-4252 [doi]
- Evaluating Variance in Visual Question Answering BenchmarksNikitha SR. 4253-4262 [doi]
- Low-Rank Prompt Adaptation for Open-Vocabulary Object DetectionZekun Zhang, Vu Quang Truong, Minh Hoai. 4263-4274 [doi]
- Vocabulary-Free Fine-Grained Visual Recognition via Enriched Contextually Grounded Vision-Language ModelDmitry Demidov, Zaigham Zaheer, Omkar Thawakar, Salman H. Khan 0001, Fahad Shahbaz Khan. 4275-4284 [doi]
- Infusing Fine-Grained Visual Knowledge to Vision-Language ModelsNikolaos-Antonios Ypsilantis, Kaifeng Chen, André Araújo 0001, Ondrej Chum. 4285-4294 [doi]
- MORFI: Mutimodal Zero-Shot Reasoning for Financial Time-Series InferenceTina Khezresmaeilzadeh, Parsa Razmara, Mohammad Erfan Sadeghi, Seyedarmin Azizi, Erfan Baghaei Potraghloo. 4295-4304 [doi]
- Audio-Visual LLM for Video UnderstandingFangxun Shu, Lei Zhang, Hao Jiang, Cihang Xie. 4305-4314 [doi]
- What Holds Back Open-Vocabulary Segmentation?Josip Saric, Ivan Martinovic, Matej Kristan, Sinisa Segvic. 4315-4325 [doi]
- TULIP: Contrastive Image-Text Learning with Richer Vision UnderstandingZineng Tang, Long Lian, Seun Eisape, Xudong Wang 0007, Roei Herzig, Adam Yala, Alane Suhr, Trevor Darrell, David M. Chan. 4326-4336 [doi]
- LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction TuningFederico Cocchi, Nicholas Moratelli, Davide Caffagni, Sara Sarto, Lorenzo Baraldi 0001, Marcella Cornia, Rita Cucchiara. 4337-4347 [doi]
- Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction DataHonglu Zhou, Xiangyu Peng, Shrikant Kendre, Michael S. Ryoo, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles. 4348-4359 [doi]
- CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract PromptsJunuk Cha, Jihyeon Kim. 4360-4368 [doi]
- GLAD: Generalizable Tuning for Vision-Language ModelsYuqi Peng, Pengfei Wang, Jianzhuang Liu, Shifeng Chen. 4369-4379 [doi]
- A Plug-and-Play Approach for Robust Image Editing in Text-to-Image Diffusion ModelsHyunwook Jo, Jiseung Maeng, Jun-Hyung Park, Namhyuk Ahn, In Kyu Park. 4380-4389 [doi]
- CobraVPS: Code Template Optimization for Better Question Reasoning Accuracy with Visual Program SynthesisJiajing Chen, Xiu Zhang, Yang Li, Renyu Zhang, Yujie Dong, Senem Velipasalar, Jing Zhang. 4390-4399 [doi]
- Learning by Taking Notes: Memory-Guided Continual Learning for Generative Multimodal ModelsYanhui Guo, Chenghuan Guo, Yan Gao, Yi Sun. 4400-4410 [doi]
- TAGS: 3D Tumor-Adaptive Guidance for SAMSirui Li, Linkai Peng, Zheyuan Zhang, Gorkem Durak, Ulas Bagci. 4411-4421 [doi]
- UltraNBA Neural Bundle Adjustment for Pose Refinement in 3D Freehand UltrasoundVahit Bugra Yesilkaynak, Vanessa Gonzalez Duque, Magdalena Wysocki, Mohammad Farid Azampour, Nassir Navab, Diana Mateus. 4422-4431 [doi]
- Best Foot Forward: Robust Foot Reconstruction in-the-wildKyle Fogarty, Jing Yang, Chayan Kumar Patodi, Jack Foster, Aadi Bhanti, Steven Chacko, Cengiz Öztireli, Ujwal Bonde. 4432-4441 [doi]
- MAPS: A Morphology-Aware PPE Segmentation Framework for Healthcare SettingsWanzhao Yang, Syed Anwar, Beomseok Park, Sifan Yuan, Aleksandra Sarcevic, Marius G. Linguranr, Randall S. Burd, Ivan Marsic. 4442-4450 [doi]
- Automated C-Arm Positioning via Conformal Landmark LocalizationAhmad Arrabi, Jay Hwasung Jung, Jax Luo, Nathan Franssen, Scott Raymond, Safwan Wshah. 4451-4460 [doi]
- Learning Generalizable Diabetic Retinopathy Grading by Decoupled State Space DecodingJingjun Yi, Qi Bi, Hao Zheng 0008, Haolan Zhan, Wei Ji 0011, Huimin Huang 0002, Yuexiang Li, Xian Wu 0001, Yefeng Zheng 0001. 4461-4471 [doi]
- Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image AnalysisEva Prakash, Jeya Maria Jose Valanarasu, Zhihong Chen, Eduardo Pontes Reis, Andrew Johnston, Anuj Pareek, Christian Bluethgen, Sergios Gatidis, Cameron Olsen, Akshay Chaudhari, Andrew Ng, Curtis Langlotz. 4472-4480 [doi]
- Preservation vs. Fabrication: An Ethical Framework of Consent, Transparency, and Integrity for Posthumous AI ArtXiantao Zhang. 4481-4490 [doi]
- Visual Design from Cultural Heritage through Inpainting-Based RestorationJungmin Lee, Guillaume Mougeot. 4491-4496 [doi]
- Style Composition Within Distinct LoRA Modules for Traditional ArtJaeHyun Lee, Wonhark Park, Wonsik Shin, Hyunho Lee, Hyoung Min Na, Nojun Kwak. 4497-4504 [doi]
- HaSPeR: An Image Repository for Hand Shadow Puppet RecognitionSyed Rifat Raiyan, Zibran Zarif Amio, Sabbir Ahmed. 4505-4515 [doi]
- A Proposed XR-based Digital Framework for Nakhwa-nori: Preserving Regional Traditional Festivals and Addressing Safety and Environmental BarrierHatnim Kim, Geonwoo Song, Seungryeol Eom, Seungkwan Choi. 4516-4523 [doi]
- K-StyleLoRA: Information-Guided Image Generation via Selective Feature LearningSoyoung Lee, Hyoungseo Cho, Myungjoo Kang, Youngjoon Yoo. 4524-4531 [doi]
- FashionRNA: Interactive Reimagination of Fashion HeritageHaneol Kim, Jonghwan Bae, Seonwoo Shin, Sanghun Park. 4532-4537 [doi]
- Lost in Translation: A Position Paper on Probing Cultural Bias in Vision-Language Models via Hanbok VQAHyeokJun Lee, Jeonghyo Song, Geonhui Jang, Young Joon Yoo. 4538-4545 [doi]
- Towards a New Copyright Paradigm for Generative AI: Bridging Human-Machine Creativity Through Legal and Policy ReformDaiwon Hyun, SunHo Park, Eutteum Kim. 4546-4553 [doi]
- WCCA-AK: A Multimodal Dataset of André Kim's Fashion Legacy for AI-Driven Cultural Heritage ResearchSeongYeon Oh, Soyoung Lee, Hyeon Seong Jeong, Sangwoo Jo, Jinyoung Kim, Yeonseo Choi, Young Joon Yoo, Taehoon Kim. 4554-4559 [doi]
- Distillation Improves Visual Place Recognition for Low Quality ImagesAnbang Yang, Ge Jin, Junjie Huang, Yao Wang, John-Ross Rizzo, Chen Feng. 4560-4569 [doi]
- From Static to Dynamic: A Survey of Topology-Aware Perception in Autonomous DrivingYixiao Chen, Ruining Yang, Xin Chen, Jia He, Dongliang Xu, Yue Yao. 4570-4582 [doi]
- A Survey on Vision-Language-Action Models for Autonomous DrivingSicong Jiang, Menglin Kong, Yihong Tang, Lijun Sun 0001, Kangan Qian, Ziang Luo, Tianze Zhu, Yunlong Wang, Xin Zhao, Tuopu Wen, Zheng Fu, Yang Zhong, Siwen Jiao, Hao Ye, Zilin Huang, Zihao Sheng, Sikai Chen, Seongjin Choi, Kun Jiang 0002, Diange Yang. 4583-4595 [doi]
- Cross-Camera Module Training of Raw Sensor Data-Based Automotive Machine Vision: Challenges and SolutionsSetareh Kian, Shannon Brooks-Lehnert, Keigo Hirakawa. 4596-4607 [doi]
- SafeRoute: Enhancing Traffic Scene Understanding via a Unified Deep Learning and Multimodal LLMAnkit Kumar Shaw, Chandan Kumar Sah, Xiaoli Lian, Arsalan Shahid Baig, Tuopu Wen, Kun Jiang 0002, Mengmeng Yang 0001, Diange Yang, Li Zhang. 4606-4615 [doi]
- Analyzing Closed-Loop Training Techniques for Realistic Traffic Agent Models in Autonomous Highway Driving SimulationsMatthias Bitzer 0001, Reinis Cimurs, Benjamin Coors, Johannes Goth, Sebastian Ziesche, Philipp Geiger, Maximilian Naumann. 4616-4625 [doi]
- NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous DrivingKexin Tian, Jingrui Mao, YunLong Zhang, Jiwan Jiang, Yang Zhou 0019, Zhengzhong Tu. 4626-4635 [doi]
- TinyBEV: Cross-Modal Knowledge Distillation for Efficient Multi-Task Bird's-Eye-View Perception and PlanningReeshad Khan, John Gauch. 4636-4644 [doi]
- SynSHRP2: A Synthetic Multimodal Benchmark for Driving Safety-Critical Events Derived from Real-World Driving DataLiang Shi, Boyu Jiang, Zhenyuan Yuan, Miguel A. Perez, Feng Guo. 4645-4655 [doi]
- Enhancing Event-Based Optical Camera Communication Via Dynamic Timing CorrectionMatthew Howard, Keigo Hirakawa. 4656-4664 [doi]
- DEIO: Deep Event Inertial OdometryWeipeng Guan, Fuling Lin, Peiyu Chen, Peng Lu 0003. 4665-4674 [doi]
- Neural Ganglion Sensors: Learning Task-specific Event Cameras Inspired by the Neural Circuit of the Human RetinaHaley M. So, Gordon Wetzstein. 4675-4685 [doi]
- Comparing Representations for Event Camera-based Visual Object TrackingOussama Abdul Hay, Sara Alansari, Mohamad Alansari, Yahya H. Zweiri. 4686-4695 [doi]
- Quantifying Accuracy of an Event-Based Star Tracker via Earth's RotationDennis Melamed, Connor Hashemi, Scott McCloskey. 4696-4703 [doi]
- Lattice-Allocated Real-Time Line Segment Feature Detection and Tracking Using Only an Event-Based CameraMikihiro Ikura, Arren Glover, Masayoshi Mizuno, Chiara Bartolozzi. 4704-4713 [doi]
- Event-Based Spinning Object SLAMEthan Elms, Yasir Latif, Tat-Jun Chin. 4714-4723 [doi]
- GraphEnet: Event-Driven Human Pose Estimation with a Graph Neural NetworkGaurvi Goyal, Pham Cong Thuong, Arren Glover, Masayoshi Mizuno, Chiara Bartolozzi. 4724-4733 [doi]
- SIS-Challenge: Event-Based Spatio-Temporal Instance Segmentation Challenge at the CVPR 2025 Event-Based Vision WorkshopFriedhelm Hamann, Emil Mededovic, Fabian Gülhan, Yue Wu 0004, Johannes Stegmaier, Jing He, Yiqing Wang, Kexin Zhang 0003, Lingling Li 0002, Licheng Jiao, Mengru Ma, Hongxiang Huang, Yuhao Yan, Hongwei Ren, Xiaopeng Lin, Yulong Huang, Bojun Cheng, Se Hyun Lee, Gyu-Sung Ham, Kanghan Oh, Gi Hyun Lim, Boxuan Yang, Bowen Du, Guillermo Gallego 0002. 4734-4742 [doi]
- Multimodal Neuromorphic Event-Frame Fusion in Domain-Generalized Vision Transformer for Dynamic Object TrackingTaha Razzaq, Asim Iqbal. 4743-4750 [doi]
- Event-Driven Robust Fitting on Neuromorphic HardwareTam Ngoc-Bang Nguyen, Anh-Dzung Doan, Zhipeng Cai, Tat-Jun Chin. 4751-4761 [doi]
- Drone Detection with Event CamerasGabriele Magrini, Lorenzo Berlincioni, Federico Becattini, Luca Cultrera, Pietro Pala. 4762-4773 [doi]
- Toward Low-SWaP Cognitive Agents: Neuromorphic Intelligence and FPGA-Based Deployments of Event Neural NetworksTrevor Bihl, Rajashree Majumder, Zhewei Wang, Avinash Karanth, Jundong Liu. 4774-4781 [doi]
- OscNet v1.5: Energy Efficient Hopfield Network on CMOS Oscillators for Image ClassificationWenxiao Cai, Zongru Li, Iris Wang, Yu-Neng Wang, Thomas H. Lee. 4792-4800 [doi]
- Human Vision Constrained Super-ResolutionVolodymyr Karpenko, Taimoor Tariq, Jorge Condor, Piotr Didyk. 4801-4810 [doi]
- Long-Tailed Data Classification by Increasing and Decreasing Neurons During TrainingTaigo Sakai, Kazuhiro Hotta. 4811-4819 [doi]
- Towards Human-Like Invariance: Self-Supervised Learning with Feature-Level Rotation AlignmentSangjun Han, Woojin Cheong, Changhoon Song, Myungjoo Kang. 4820-4829 [doi]
- From Neural Activity to Computation: Biological Reservoirs for Pattern Recognition in Digit ClassificationLudovico Iannello, Luca Ciampi, Fabrizio Tonelli, Gabriele Lagani, Lucio Maria Calcagnile, Federico Cremisi, Angelo Di Garbo, Giuseppe Amato 0001. 4830-4839 [doi]
- Using Human Perception to Regularize Transfer LearningSteve Cruz, Justin Dulay, Walter J. Scheirer. 4840-4850 [doi]
- HiSS: Human-Inspired Semantic Segmentation for Vehicle Interior Scene UnderstandingAleksander Kostuch, Joanna Jaworek-Korjakowska. 4851-4860 [doi]
- Co-VisiON: Co-Visibility ReasONing on Sparse Image Sets of Indoor ScenesChao Chen, Nobel Dang, Juexiao Zhang, Wenkai Sun, Pengfei Zheng, Xuhang He, Yimeng Ye, Jiasheng Zhang, Taarun Srinivas, Chen Feng 0002. 4861-4871 [doi]
- Human-Inspired Summarization: Cluster Scene Videos into Diverse FramesChao Chen, Mingzhi Zhu, Ankush Pratap Singh, Yu Yan, Felix Juefei-Xu, Chen Feng 0002. 4872-4882 [doi]
- SeeEEG: Semantic-aware EEG-based Multi-Modal Retrieval-Augmented Generation for High-Fidelity Visual Brain DecodingJunmo Kim 0001, Woohyeok Choi, Sang-Jun Park, Keun-Soo Heo, Young-Han Son, Ji-Hye Oh, Dong-Hee Shin, Tae-Eui Kam. 4883-4892 [doi]
- AttZoom: Attention Zoom for Better Visual FeaturesDaniel DeAlcala, Aythami Morales, Julian Fierrez, Ruben Tolosana. 4893-4902 [doi]
- Who Walks with You Matters: Perceiving Social Interactions with Groups for Pedestrian Trajectory PredictionZiqian Zou, Conghao Wong, Beihao Xia, Xinge You. 4903-4912 [doi]
- Exploring Human-Model Alignment in Visual Social Attention During Help-and-Hinder Social Interaction ClassificationLucia Schiatti, Guido Vallarino, Sabrina M. Lopez, Yen-ling Kuo, Matteo Moro, Mengmi Zhang, Monica Gori, Alessio Del Bue, Boris Katz, Andrei Barbu. 4913-4922 [doi]
- A Comparative Study of RGB-based Continuous Sign Language Recognition TechniquesSarah N. Alyami, Hamzah Luqman. 4923-4932 [doi]
- Development of an Intelligent System for Recognizing Islamic Religious Visual Signs in the Arabic LanguageDuaa Mhnaa, Yaroub Dayoub, Jafar Salman. 4933-4941 [doi]
- Iterative Latent Refinement for Robust Non-Autoregressive Sign Language ProductionTugçe Kiziltepe, Sümeyye Meryem Tasyürek, Hacer Yalim Keles. 4942-4952 [doi]
- Text-Aligned Radar-Based Sign Language Recognition for Healthcare CommunicationRaffaele Mineo, Amelia Sorrenti, Gaia Caligiore, Federica Proietto Salanitri, Giovanni Bellitto, Senya Polikovsky, Sabina Fontana, Egidio Ragonese, Concetto Spampinato, Simone Palazzo. 4953-4961 [doi]
- Inclusive Sign Language AI: Towards Authentic Accessibility Through Community CollaborationAbraham Glasser. 4962-4967 [doi]
- A Closer Look at Skeleton-Based Continuous Sign Language RecognitionYuecong Min, Yifan Yang, Peiqi Jiao, Zixi Nan, Xi-Lin Chen 0001. 4968-4974 [doi]
- Point-Supervised Japanese Fingerspelling Localization via HR-Pro and Contrastive LearningRyota Murai, Naoto Tsuta, Duk Shin, Yousun Kang. 4975-4982 [doi]
- FusionEnsembleNet: An Attention-Based Ensemble of Spatiotemporal Networks for Multimodal Sign Language RecognitionMd. Milon Islam, Md. Rezwanul Haque, S. M. Taslim Uddin Raju, Fakhri Karray. 4983-4989 [doi]
- A Signer-Invariant Conformer and Multi-Scale Fusion Transformer for Continuous Sign Language RecognitionMd. Rezwanul Haque, Md. Milon Islam, S. M. Taslim Uddin Raju, Fakhri Karray. 4990-4999 [doi]
- RF -ChessSIGN: Radar-enabled Human-Computer Interaction in a Real- Time Sign Language-Controlled GameKenneth DeHaan, Emre Kurtoglu, Sabyasachi Biswas, Caroline Kobek Pezzarossi, Darrin J. Griffin, Chris S. Crawford, Ali Cafer Gürbüz, Evie Malaia, Abraham Glasser, Raja Kushalnagar, Sevgi Zübeyde Gürbüz. 5000-5010 [doi]
- SAGE: Segment-Aware Gloss-Free Encoding for Token-Efficient Sign Language TranslationLow Jian He, Ozge Mercanoglu Sincan, Richard Bowden. 5011-5020 [doi]
- Region-Aware Pose Modeling and Permutation Decoding for Signer-Independent Sign Language RecognitionSieu Tran, Duc Nguyen Minh, Truong Nguyen Thanh, Hao Vo, Thanh Duc Ngo, Tien Do, Khiem Le, Duy-Dinh Le. 5021-5029 [doi]
- Generalizable Sign Language Recognition via Local Temporal Convolutions and Region-Aware Pose EncodingSieu Tran, Duc Nguyen Minh, Truong Nguyen Thanh, Hao Vo, Thanh Duc Ngo, Tien Do, Khiem Le, Duy-Dinh Le. 5030-5037 [doi]
- Can a Lightweight Transformer Deliver a Robust Multimodal Sign Language Word Recognition?Eva Berepiki, Philip Ciunkiewicz, Svetlana N. Yanushkevich. 5038-5045 [doi]
- Modality-Specific Benchmarks and Radar Range-Doppler Envelope Classification for Multimodal Isolated Sign Language RecognitionDmitriy Sazonov, Kamrul Islam, Evie Malaia, Sevgi Zubeyde Gurbuz. 5046-5053 [doi]
- A Smart Glove to Convert Gestures to Speech & Text to Assist Deaf & Mute People Using Machine LearningManisha A, Tamilselvi S. 5054-5060 [doi]
- A Multimodal Video and Radar Fusion Framework for High-Accuracy Isolated Sign Language RecognitionSultan Mohammad Manjur, Sabyasachi Biswas, Ali Cafer Gürbüz. 5061-5070 [doi]
- AutoSign: Direct Pose-to-Text Translation for Continuous Sign Language RecognitionSamuel Ebimobowei Johnny, Blessed Guda, Andrew Blayama Stephen, Assane Gueye. 5071-5078 [doi]
- Multimodal Italian Sign Language Recognition with Radar-Video Late Fusion on the MultiMeDaLIS DatasetJakub F. Juranek. 5079-5085 [doi]
- The SignEval 2025 Challenge at the ICCV Multimodal Sign Language Recognition Workshop: Results and DiscussionHamzah Luqman, Raffaele Mineo, Murtadha Aljubran, Ahmed Abul Hasanaath, Amelia Sorrenti, Sarah N. Alyami, Sadam Al-Azani, Maad Alowaifeer, Jihwan Moon, Vaclav Javorek, Tomas Zelezný, Marek Hrúz, Gaia Caligiore, Silvio Giancola, Senya Polikovsky, Motaz Alfarraj, Sabina Fontana, Mufti Mahmud, Muhammad Haris Khan, Kamrul Islam, Sevgi Zubyede Gurbuz, Egidio Ragonese, Giovanni Bellitto, Federica Proietto Salanitri, Concetto Spampinato, Simone Palazzo. 5086-5095 [doi]
- VidMP3: Video Editing by Representing Motion with Pose and Position PriorsSandeep Mishra, Oindrila Saha, Alan C. Bovik. 5096-5106 [doi]
- LSSGen: Leveraging Latent Space Scaling in Flow and Diffusion for Efficient Text to Image GenerationJyun-Ze Tang, Chih-Fan Hsu, Jeng-Lin Li, Ming-Ching Chang, Wei-Chao Chen. 5107-5116 [doi]
- C2D-ISR: Optimizing Attention-Based Image Super-Resolution from Continuous to Discrete ScalesYuxuan Jiang 0015, Chengxi Zeng, Siyue Teng, Fan Zhang 0017, Xiaoqing Zhu, Joel Sole, David Bull. 5117-5127 [doi]
- Multi-Scale Contrastive-Adversarial Distillation for Super-ResolutionDonggeun Ko, Youngsang Kwak, San Kim, Jaehwa Kwak, Jaekwang Kim. 5128-5137 [doi]
- High-Fidelity 4x Neural Reconstruction of Real-Time Path Traced VideosZhiqiang Lao, Zekai Chen, Jiali Cui, Marcos Conde Osorio, Heather Yu. 5138-5146 [doi]
- Species-level Detection and Tracking of Caribbean Coral Reef FishLevi Cai, Austin Greene, Daniel Yang, Nadège Aoki, Sierra D. Jarriel, Jasper Ha, Nathan Formel, T. Aran Mooney, Yogesh A. Girdhar. 5147-5151 [doi]
- Fine-Grained Beetle Taxonomy with Vision Models: A Benchmark on Long-Tailed and Domain-Adaptive ClassificationS. M. Rayeed, Alyson East, Samuel Stevens 0001, Sydne Record, Charles V. Stewart. 5152-5158 [doi]
- Back Home: A Computer Vision Solution to Seashell Identification for Ecological RestorationAlexander Valverde, Luis Solano, André Montoya. 5159-5168 [doi]
- Bridging Domain Gaps for Fine-Grained Moth Classification Through Expert-Informed Adaptation and Foundation Model PriorsRoss Gardiner, Guillaume Mougeot, Sareh Rowlands, Benno I. Simmons, Flemming Helsing, Toke Thomas Høye. 5169-5174 [doi]
- Divide and Conquer: Structured Reranking for Expert-Level Ecological Image RetrievalAsmi Kumar, Edward Vendrow, Sara Beery. 5175-5184 [doi]
- MONET: Multi-Modal Online Continual Learning with Novelty EstimationEvelyn Chee, Wynne Hsu, Mong-Li Lee. 5185-5194 [doi]
- Adapting Curricula by Tracking the Gap between Validation and Training LossRaghavendra Singh. 5195-5202 [doi]
- Accuracy Improvement of Prompt-Based Continual Learning with Past InformationIkki Kato, Kazuhiro Hotta. 5203-5211 [doi]
- RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object DetectionMatthias Neuwirth-Trapp, Maarten Bieshaar, Danda Pani Paudel, Luc Van Goo. 5212-5223 [doi]
- Incremental Object Detection with Prompt-Based MethodsMatthias Neuwirth-Trapp, Maarten Bieshaar, Danda Pani Paudel, Luc Van Gool. 5224-5232 [doi]
- Core Tokensets for Data-Efficient Sequential Training of TransformersSubarnaduti Paul, Manuel Brack, Patrick Schramowski, Kristian Kersting, Martin Mundt. 5233-5243 [doi]
- Does Prior Data Matter? Exploring Joint Training in the Context of Few-Shot Class-Incremental LearningShiwon Kim, Dongjun Hwang, Sungwon Woo, Rita Singh. 5244-5253 [doi]
- Evolving from Unknown to Known: Retentive Angular Representation Learning for Incremental Open Set RecognitionRunqing Yang, Yimin Fu, Changyuan Wu, Zhunga Liu. 5254-5263 [doi]
- B-Cos Networks as an Architectural Inductive Bias for Mitigating Catastrophic ForgettingAmelia Sorrenti, Giovanni Bellitto, Salvatore Calcagno 0002, Rutger Hendrix, Concetto Spampinato, Simone Palazzo. 5264-5272 [doi]
- Can Synthetic Images Conquer Forgetting? Beyond Unexplored Doubts in Few-Shot Class-Incremental LearningJunsu Kim, Yunhoe Ku, SeungRyul Baek. 5273-5282 [doi]
- Warehouse Spatial Question Answering with LLM Agent 1st Place Solution of the 9th AI City Challenge Track 3Hsiang-Wei Huang, Pyong-Kun Kim, Jen-Hao Cheng, Kuang-Ming Chen, Cheng-Yen Yang, Bahaa Alattar, Yi-Ru Lin, Sangwon Kim, Kwangju Kim, Chung-I Huang, Jenq-Neng Hwang. 5283-5287 [doi]
- TrafficInternVL: Understanding Traffic Scenarios with Vision-Language ModelsHsiu-Fu Wu, Ya-Ting Yang, Yung-Ter Chen, I-Fan Chou. 5288-5295 [doi]
- Enhanced Fisheye Object Detection via YOLO Ensemble Learning and Weighted Box FusionChun-Ming Tsai, Li-Li Wu, Tsung-Yu Chen. 5296-5303 [doi]
- MCBLT: Multi-Camera Multi-Object 3D Tracking in Long VideosYizhou Wang 0001, Tim Meinhardt, Orcun Cetintas, Cheng-Yen Yang, Sameer Satish Pusegaonkar, Benjamin Missaoui, Sujit Biswas, Zheng Tang, Laura Leal-Taixé. 5304-5313 [doi]
- A Unified Detection Pipeline for Robust Object Detection in Fisheye-Based Traffic SurveillanceNeema Jakisa Owor, Joshua Kofi Asamoah, Tanner Wambui Muturi, Anneliese Jakisa Owor, Blessing Agyei Kyem, Andrews Danyo, Yaw Adu-Gyamfi, Armstrong Aboah. 5314-5321 [doi]
- Augmentation, Distillation and Optimization: A Practical Pipeline for Fisheye Object Detection on Edge DevicesViet Hung Duong, Van Duy Truong, Duy Khanh Dinh, Huan Vu, Thien Van Luong, Tien Cuong Nguyen. 5322-5329 [doi]
- SmolRGPT: Efficient Spatial Reasoning for Warehouse Environments with 600M ParametersAbdarahmane Traoré, Éric Hervet, Andy Couturier. 5330-5338 [doi]
- Prompt-Guided Spatial Understanding with RGB-D Transformers for Fine-Grained Object Relation ReasoningTanner Wambui Muturi, Blessing Agyei Kyem, Joshua Kofi Asamoah, Neema Jakisa Owor, Richard Dyzinela, Andrews Danyo, Yaw Adu-Gyamfi, Armstrong Aboah. 5339-5347 [doi]
- DepthTrack: Cluster Meets BEV for Multi-Camera Multi-Target 3D TrackingTai Huu-Phuong Tran, Duong Nguyen-Ngoc Tran, Ngoc Doan-Minh Huynh, Chi Dai Tran, Long Hoang Pham, Quoc Pham-Nam Ho, Huy Hung Nguyen, Duong Khac Vu, Hyung-Min Jeon, Hyung Joon Jeon, Son Hong Phan, Trinh Le Ba Khanh, Jae Wook Jeon. 5348-5357 [doi]
- TrafficInternVL: Spatially-Guided Fine-Tuning with Caption Refinement for Fine-Grained Traffic Safety Captioning and Visual Question AnsweringSasin Phimsiri, Sarut Sunpawatr, Riu Cherdchusakulchai, Pornprom Kiawjak, Teepakorn Tosawadi, Suchat Tungjitnob, Visarut Trairattanapa, Supawit Vatathanavaro, Wasu Kudisthalert, Chaitat Utintu, Worawit Saetan, Nathamon Kongsawat, Phawat Borisuitsawat, Kasisdis Mahakijdechachai, Nitipan Su-Inn, Ek Thamwiwatthana, Vasin Suttichaya. 5358-5365 [doi]
- TrafficVILA: Scaling Vision-Language Models to High-Resolution Video Understanding for Traffic Safety AnalysisZaid Pervaiz Bhat, Seunghwan Cha, Rohan Gulati, Monika Jhuria, Varun Praveen, Tomasz Kornuta, Yao Lu, Vidya N. Murali. 5366-5373 [doi]
- EKI-GAN: Context-Aware Vehicle Trajectory Forecasting with Vehicle Factors and Environmental Information at Signalized IntersectionsChuheng Wei, Guoyuan Wu 0001, Matthew J. Barth. 5374-5383 [doi]
- Task-Specific Dual-Model Framework for Comprehensive Traffic Safety Video Description and AnalysisBlessing Agyei Kyem, Neema Jakisa Owor, Andrews Danyo, Joshua Kofi Asamoah, Eugene Kofi Okrah Denteh, Tanner Wambui Muturi, Anthony Dontoh, Yaw Adu-Gyamfi, Armstrong Aboah. 5384-5392 [doi]
- Data Augmentation is All You Need for Robust Fisheye Object DetectionLong Hoang Pham, Quoc Pham-Nam Ho, Duong Khac Vu, Huy Hung Nguyen, Chi Dai Tran, Duong Nguyen-Ngoc Tran, Tai Huu-Phuong Tran, Ngoc Doan-Minh Huynh, Hyung Joon Jeon, Hyung-Min Jeon, Son Hong Phan, Trinh Le Ba Khanh, Jae Wook Jeon. 5393-5401 [doi]
- Hierarchical Multi-Modal Fusion for Roadside VRU Detection: Method Complementarity Under Sparse Label ConstraintsChuheng Wei, Ziyan Zhang, Haishan Liu, Guoyuan Wu 0001, Matthew J. Barth. 5402-5409 [doi]
- Multimodal and Multi-task Fusion for Spatial ReasoningVan-Minh Dang, Thanh-Sach Le. 5410-5416 [doi]
- Multi-Camera 3D Object Tracking via 3D Point Clouds and Re-IdentificationJaewon Lee, Heecheol Kim, Doohee Lee, Kanghee Lee. 5417-5424 [doi]
- Domain-Aware Enhancements to Vision-Language Models for Urban Traffic Safety Question AnsweringVu Thanh Dat Ha, Tuan Huy Tran, Gang Thep Dong, Ngoc Chien Chu, Huan Vu, Tien Cuong Nguyen. 5425-5433 [doi]
- VGCRTrack: Multi-Camera 3D Tracking with View-Aware Geometric Center RefinementTrieu-Huy Phan, Duc-Duy Dinh, Thanh-Phong Huynh, Quoc-Thinh Le, Huy-Hoang Dang, Vu-Hoang Tran, Van-Tin Luu, Ching-Chun Huang. 5434-5440 [doi]
- A Real-Time Vehicle Detection Pipeline with Data-Centric Enhancements and Multi-Stage DETR DistillationHuy Minh Nhat Nguyen, Hieu Dinh Trung Pham, Khang Minh Le, Cuong Tuan Nguyen. 5441-5448 [doi]
- Online 3D Multi-Camera Perception Through Robust 2D Tracking and Depth-Based Late AggregationVu-Minh Le, Thao-Anh Tran, Duc Huy Do, Xuan Canh Do, Huong Ninh, Hai Tran. 5449-5459 [doi]
- TrafficVILA: A Multimodal Framework for Traffic Safety Description and AnalysisBeomseok Park, Wanzhao Yang, Sifan Yuan, Syed Muhammad Anwar, Ivan Marsic. 5460-5468 [doi]
- Efficient and Distortion-Aware Fisheye Object Detection for Edge DevicesBao Tran Gia, Tuong Bui Cong Khanh, Tam Le Thi Thanh, Hien Ho Trong, Tien Do, Thanh Duc Ngo, Duy-Dinh Le, Shin'ichi Satoh 0001. 5469-5475 [doi]
- Real-Time Object Detection on Edge Devices: A Fisheye Specific DFINEJaewoo Park, Jihae Lee, Yunjeong Yong. 5476-5485 [doi]
- Multi-Agent Cooperation for Traffic Safety Description and AnalysisRidham Kachhadiya, Dhanishtha Patil, David C. Anastasiu. 5486-5494 [doi]
- Boosting Fisheye Detection with Augmentations and EnsemblesChengzhi Qian, Jing Li, Yangyang Huang, Zhixin Lin, Yue Yao, Jianping Wang. 5495-5499 [doi]
- A Lightweight and Data-Centric Framework for Real-Time Object Detection in Fisheye CameraAn To Vinh, Nguyen Mai Vinh, Nguyen Luong Si, Duy Do Quoc, Duy Tran Khanh, Nguyen Nguyen-Hoang, Tien Van Do 0002, Thanh Duc Ngo, Duy-Dinh Le, Shin'ichi Satoh. 5500-5507 [doi]
- TinyGiantVLM: A Lightweight Vision-Language Architecture for Spatial Reasoning under Resource ConstraintsVinh-Thuan Ly, Hoang M. Truong, Xuan-Huong Nguyen. 5508-5515 [doi]
- STER-VLM: Spatio-Temporal with Enhanced Reference Vision-Language ModelsTinh-Anh Nguyen-Nhu, Triet Dao Hoang Minh, Dat To-Thanh, Phuc Le-Gia, Tuan Vo-Lan, Binh T. Nguyen 0001. 5516-5525 [doi]
- The 9th AI City ChallengeZheng Tang, Shuo Wang, David C. Anastasiu, Ming-Ching Chang, Anuj Sharma 0001, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Ganzorig Batnasan, Munkh-Erdene Otgonbold, Fady Alnajjar, Jun-Wei Hsieh, Tomasz Kornuta, Xiaolong Li, Yilin Zhao, Han Zhang, Subhashree Radhakrishnan, Arihant Jain, Ratnesh Kumar 0004, Vidya N. Murali, Yuxing Wang, Sameer Satish Pusegaonkar, Yizhou Wang 0001, Sujit Biswas, Xunlei Wu, Zhedong Zheng, Pranamesh Chakraborty, Rama Chellappa. 5526-5535 [doi]
- DMS: Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth EstimationZihua Liu, Yizhou Li, Songyan Zhang, Masatoshi Okutomi. 5536-5547 [doi]
- ReBaIR: Reference-Based Image RestorationMichael Bernasconi, Abdelaziz Djelouah, Yang Zhang 0003, Markus Groß 0001, Christopher Schroers. 5548-5557 [doi]
- MedShift: Implicit Conditional Transport for X-Ray Domain AdaptationFrancisco Caetano, Christiaan G. A. Viviers, Peter H. N. de With, Fons van der Sommen. 5558-5567 [doi]
- LMLT: Low-to-high Multi-Level Vision Transformer for Lightweight Image Super- ResolutionJeongsoo Kim, Jongho Nang, Junsuk Choe. 5568-5578 [doi]
- DIFFRACT: Diffusion-Based Restoration via Adaptive Control and Thresholding for Diffraction ImagingNikolay Falaleev, Nikolai Orlov. 5579-5588 [doi]
- JFFRA : Joint Flow and Feature Refinement Using Attention for Video RestorationRanjith Merugu, Mohammad Sameer Suhail, Akshay P. Sarashetti, Venkata Bharath Reddy Reddem, Pankaj Kumar Bajpai, Amit Satish Unde. 5589-5599 [doi]
- AIM 2025 Challenge on High FPS Motion Deblurring: Methods and ResultsGeorge Ciubotariu, Florin-Alexandru Vasluianu, Zhuyun Zhou, Nancy Mehta, Radu Timofte, Ke Wu, Long Sun, Lingshun Kong, Zhongbao Yang, Jinshan Pan, Jiangxin Dong, Jinhui Tang 0001, Hao Chen, Yinghui Fang, Dafeng Zhang, Yongqi Song, Jiangbo Guo, Shuhua Jin, Zeyu Xiao, Rui Zhao 0001, Zhuoyuan Li, Cong Zhang, Yufeng Peng, Xin Lu 0006, Zhijing Sun, Chengjie Ge, Zihao Li, Zishun Liao, Ziang Zhou, Qiyu Kang, Xueyang Fu, Zheng-Jun Zha, Yuqian Zhang, Shuai Liu 0009, Jie Liu, Zhuhao Zhang, Lishen Qu, Zhihao Liu, Shihao Zhou 0003, Yaqi Luo, Juncheng Zhou, Jufeng Yang, Qianfeng Yang, Qiyuan Guan, Xiang Chen, Guiyue Jin, Jiyu Jin. 5600-5609 [doi]
- AIM 2025 Rip Current Segmentation (RipSeg) Challenge ReportAndrei Dumitriu, Florin Miron, Florin Tatui, Radu-Tudor Ionescu, Radu Timofte, Aakash Ralhan, Florin-Alexandru Vasluianu, Shenyang Qian, Mitchell Harley, Imran Razzak, Yang Song 0001, Pu Luo, Yumei Li, Cong Xu, Jinming Chai, Kexin Zhang 0003, Licheng Jiao, Lingling Li 0002, Siqi Yu, Chao Zhang, Kehuan Song, Fang Liu 0001, Puhua Chen, Xu Liu 0006, Jin Hu, Jinyang Xu, Biao Liu. 5610-5619 [doi]
- Diffusion-Based Compression Quality Tradeoffs Without RetrainingJonas Brenig, Radu Timofte. 5620-5629 [doi]
- AIM 2025 challenge on Inverse Tone Mapping Report: Methods and ResultsChao Wang, Francesco Banterle, Bin Ren 0005, Radu Timofte, Xin Lu 0006, Yufeng Peng, Chengjie Ge, Zhijing Sun, Ziang Zhou, Zihao Li, Zishun Liao, Qiyu Kang, Xueyang Fu, Zheng-Jun Zha, Zhijing Sun, Xingbo Wang, Kean Liu, Senyan Xu, Yang Qiu, Yifan Ding 0002, Gabriel Eilertsen, Jonas Unger, Zihao Wang, Ke Wu, Jinshan Pan, Zhen Liu 0022, Zhongyang Li, Shuai Liu 0009, S. M. Nadim Uddin. 5630-5643 [doi]
- Efficient High FPS Non-Uniform Motion Deblurring via Progressive LearningXin Lu 0006, Zhijing Sun, Chengjie Ge, Yufeng Peng, Ziang Zhou, Zihao Li, Zishun Liao, Dong Li 0055, Qiyu Kang, Xueyang Fu, Zheng-Jun Zha. 5644-5653 [doi]
- Boosting Inverse Tone Mapping via Diffusion RegularizationXin Lu 0006, Yufeng Peng, Chengjie Ge, Zhijing Sun, Ziang Zhou, Zishun Liao, Zihao Li, Dong Li 0055, Qiyu Kang, Xueyang Fu, Zheng-Jun Zha. 5654-5663 [doi]
- RCENet: Recursive Concatenation and Enhancement Network for Real-Time Super-ResolutionKihwan Yoon, Ganzorig Gankhuyag, Jinwoo Jeong. 5664-5673 [doi]
- AIM 2025 Challenge on Robust Offline Video Super-Resolution: Dataset, Methods and ResultsNikolai Karetin, Ivan Molodetskikh, Dmitry Vatolin, Radu Timofte, Yixin Yang, Junyang Chen 0001, Jiangxin Dong, Jinshan Pan, Zhihao Liu, Lishen Qu, Shihao Zhou, Jufeng Yang, Yuxuan Jiang 0015, Siyue Teng, Chengxi Zeng, Fan Zhang, David Bull 0001, Qi Tang, Jie Liu, Jie Tang 0006, Gangshan Wu. 5674-5682 [doi]
- LiteRT-Optimized INT8 LLM for Raspberry Pi4 DeploymentKihwan Yoon, Hyeon-Cheol Moon, Aeri Kim, Sungjei Kim, Sang-Seol Lee, Sung-Joon Jang, Ganzorig Gankhuyag, Jinwoo Jeong. 5683-5692 [doi]
- Multi-Scale Tensorial Summation and Dimensional Reduction Guided Neural Network for Edge DetectionLei Xu 0036, Mehmet Yamac, Mete Ahishali, Moncef Gabbouj. 5693-5702 [doi]
- AIM 2025 Low-Light RAW Video Denoising Challenge: Dataset, Methods and ResultsAlexander Yakovenko, George Chakvetadze, Ilya Khrapov, Maksim Zhelezov, Dmitry Vatolin, Radu Timofte, Youngjin Oh, Junhyeong Kwon, Junyoung Park, Nam Ik Cho, Senyan Xu, Ruixuan Jiang, Long Peng, Xueyang Fu, Zheng-Jun Zha, Xiaoping Peng, Hansen Feng, Zhanyi Tie, Ziming Xia, Lizhi Wang 0001. 5703-5713 [doi]
- AIM 2025 Challenge on Screen-Content Video Quality Assessment: Methods and ResultsNikolay Safonov, Rakhmanov Mikhail, Dmitry Vatolin, Radu Timofte, Chunyu Wu, Kejing Wu, Kishor Kumar Patro, Pankaj Rathour, Sumohana S. Channappayya, Pravin Pardhi, Vipin Milind Kamble, Kishor Bhurchandi, Biao Liu, Jin Hu, Jinyang Xu, Yang Dayu, Chen Yihua. 5714-5722 [doi]
- Optuna vs Code Llama: Are LLMs a New Paradigm for Hyperparameter Tuning?Roman Kochnev, Arash Torabi Goodarzi, Zofia Antonina Bentyn, Dmitry Ignatov, Radu Timofte. 5723-5733 [doi]
- Practical Manipulation Model for Robust Deepfake DetectionBenedikt Hopf, Radu Timofte. 5734-5743 [doi]
- AIM 2025 Challenge on Real-World RAW Image DenoisingFeiran Li, Jiacheng Li, Marcos V. Conde, Beril Besbinar, Vlad Hosu, Daisuke Iso, Radu Timofte. 5744-5757 [doi]
- Efficient Perceptual Image Super Resolution: AIM 2025 Study and BenchmarkBruno Longarela, Marcos V. Conde, Álvaro García, Radu Timofte. 5758-5771 [doi]
- Efficient Real-World Deblurring Using Single Images: AIM 2025 Challenge ReportDaniel Feijoo, Paula Garrido-Mellado, Marcos V. Conde, Jaesung Rim, Álvaro García, Sunghyun Cho, Radu Timofte. 5772-5780 [doi]
- Probabilistic Domain Adaptation for Biomedical Image SegmentationAnwai Archit, Constantin Pape. 5781-5789 [doi]
- From Natural to Nanoscale: Training ControlNet on Scarce FIB-SEM Data for Augmenting Semantic Segmentation DataHannah Kniesel, Pascal Rapp, Pedro Hermosilla, Timo Ropinski. 5790-5799 [doi]
- A Comparison of Data-Driven Shape Quantification Methods for 2D Microscopy ImagesAnna Foix Romero, Alexander Krull, Virginie Uhlmann. 5800-5809 [doi]
- NeuReg: Domain-Invariant Brain Registration Across Species and ModalitiesTaha Razzaq, Asim Iqbal. 5810-5820 [doi]
- Topology-Preserving Image Segmentation with Spatial-Aware Persistent Feature MatchingBo Wen, Haochen Zhang, Dirk-Uwe G. Bartsch, William R. Freeman, Truong Q. Nguyen, Cheolhong An. 5821-5830 [doi]
- Cubic: CUDA-Accelerated 3D Bioimage ComputingAlexandr A. Kalinin, Anne E. Carpenter, Shantanu Singh, Matthew J. O'meara. 5831-5840 [doi]
- Preserving Instance Continuity and Length in Segmentation Through Connectivity-Aware Loss ComputationKarol Szustakowski, Luk Frank, Julia Esser, Jan Gründemann, Marie Piraud. 5841-5850 [doi]
- An Investigation of Unsupervised Cell Tracking and Interactive Fine-TuningManan Lalit, Jan Funke. 5851-5859 [doi]
- SynapFlow: A Modular Framework Towards Large-Scale Analysis of Dendritic SpinesPamela Osuna-Vargas, Altug Kamacioglu, Dominik F. Aschauer, Petros E. Vlachos, Sercan Alipek, Jochen Triesch, Simon Rumpel, Matthias Kaschube. 5860-5870 [doi]
- SelfAdapt: Unsupervised Domain Adaptation of Cell Segmentation ModelsFabian H. Reith, Jannik Franzen, Josef Lorenz Rumberger, Dagmar Kainmueller. 5871-5878 [doi]
- Noise is an Efficient Learner for Zero-Shot Vision-Language ModelsRaza Imam, Asif Hanif, Jian Zhang, Khaled Waleed Dawoud, Yova Kementchedjhieva, Mohammad Yaqub. 5879-5888 [doi]
- STORM: Token-Efficient Long Video Understanding for Multimodal LLMsJindong Jiang, Xiuyu Li, Zhijian Liu, Muyang Li, Guo Chen 0006, Zhiqi Li, De-An Huang, Guilin Liu, Zhiding Yu, Kurt Keutzer, Sungjin Ahn, Jan Kautz, Hongxu Yin, Yao Lu 0006, Song Han 0003, Wonmin Byeon. 5889-5900 [doi]
- Context-Aware Image Caption Editing via Hallucination-Resistant Visual Instruction TuningYoonhyung Kim, Byung-Ok Kang, Hwa Jeon Song. 5901-5911 [doi]
- PLOT-TAL: Prompt-Learning with Optimal Transport for Few-Shot Temporal Action LocalizationEdward Fish, Andrew Gilbert. 5912-5921 [doi]
- AutoConcept: Unsupervised Extraction of Constituent Concepts from Single ImagePranav Singh Chib, Kirtankumar Vijaykumar Patel, Mudit Gupta, Pise Ashutosh Kalidas, Pravendra Singh. 5922-5932 [doi]
- Demographic Differentials in Face Image Quality: Evaluation and Comparison on Real and Synthetic DataAndré Dörsch, Johannes Merkle, Benjamin Tams, G. Gutierrez Alvarez, Peter Munch, Christoph Busch 0001, Christian Rathgeb. 5933-5941 [doi]
- Intrinsically-Interpretable Siamese Networks for Identity RecognitionMarco A. Rocha, Jaime S. Cardoso 0001, Helena Montenegro. 5942-5951 [doi]
- Evaluation of Human Visual Privacy Protection: A Three-Dimensional Framework and Benchmark DatasetSara Abdulaziz, Giacomo D'Amicantonio, Egor Bondarev. 5952-5961 [doi]
- TAIGen: Training-Free Adversarial Image Generation via Diffusion ModelsSusim Mukul Roy, Anubhooti Jain, Mayank Vatsa, Richa Singh 0001. 5962-5972 [doi]
- NegFaceDiff: The Power of Negative Context in Identity-Conditioned Diffusion for Synthetic Face GenerationEduarda Caldeira, Naser Damer, Fadi Boutros. 5973-5983 [doi]
- Now You See Me, Now You Don't: A Unified Framework for Expression Consistent Anonymization in Talking Head VideosAnil Egin, Andrea Tangherloni, Antitza Dantcheva. 5984-5993 [doi]
- ViT-FIQA: Assessing Face Image Quality using Vision TransformersAndrea Atzori, Fadi Boutros, Naser Damer. 5994-6004 [doi]
- Are You in or Out (of Gallery)? Wisdom from the Same-Identity CrowdAman Bhatta, Maria Dhakal, Michael C. King, Kevin W. Bowyer. 6005-6014 [doi]
- On Adversarial Robustness of Face Presentation Attack Detection AlgorithmsAkshay Agarwal 0001, Mayank Vatsa, Richa Singh 0001. 6015-6023 [doi]
- AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand SafetyAdi Levi, Or Levi, Sardhendu Mishra, Jonathan Morra. 6024-6032 [doi]
- A Survey of Human Synergy in Influencer Marketing Through Authenticity-Preserving Content Generation ApproachesNafi Diallo, Pegah Ojaghi. 6033-6040 [doi]
- CLASP: Adaptive Spectral Clustering for Unsupervised Per-Image SegmentationMax Curie, Paulo da Costa 0002. 6041-6050 [doi]
- Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility AnalysisMaciej Szankin, Vidhyananth Venkatasamy, Lihang Ying. 6051-6059 [doi]
- Privacy-Preserving Audience Analytics: Lightweight Thermal Face Recognition for Real-Time Marketing Intelligence at the EdgeMaciej Szankin, Jacek Ruminski. 6060-6068 [doi]
- Align Before You Recommend: Parameter Efficient Personalization via Cross-Attentive Fusion of Hierarchical Language ModelsAlicja Kwasniewska, Marcin Bednarz, Chad Neal. 6069-6078 [doi]
- Training-Free Diffusion Framework for Stylized Image Generation with Identity PreservationMohammad Ali Rezaei, Helia Hajikazem, Saeed Khanehgir, Mahdi Javanmardi. 6079-6088 [doi]
- A Multi-Stage Pipeline for Accurate Handwritten Information Extraction from Financial FormsGuanghui Wang, Jinze Yu, Xing Zhang, Tomal Deb, Xuefeng Liu, Peiyang He. 6089-6097 [doi]
- MGT: Extending Virtual Try-Off to Multi-Garment ScenariosRiza Velioglu, Petra Bevandic, Robin Chan 0001, Barbara Hammer. 6098-6107 [doi]
- Cross-Lingual Visual Text Stylization and Synthesis Incorporating Text Rendering and Diffusion ModelMinmin Shen, Caren Chen. 6108-6116 [doi]
- ByDeWay: Boost Your Multimodal LLM with DEpth Prompting in a Training-Free WayRajarshi Roy 0008, Devleena Das, Ankesh Banerjee, Arjya Bhattacharjee, Kousik Dasgupta, Subarna Tripathi. 6117-6123 [doi]
- Reasoning-Enhanced Prompt Strategies for Multi-Label ClassificationJinze Yu. 6124-6129 [doi]
- From Pixels to Context: Adapting Generative Models for Advertising at ScaleHyunHee Chung, Taeyoung Na. 6130-6137 [doi]
- Toward Scalable Video Narration: A Training-Free Approach Using Multimodal Large Language ModelsTz-Ying Wu, Tahani Trigui, Sharath Nittur Sridhar, Anand V. Bodas, Subarna Tripathi. 6138-6146 [doi]
- Similarity-Aware Selective State-Space Modeling for Semantic CorrespondenceSeungwook Kim, Minsu Cho. 6147-6157 [doi]
- Video-MMLU: A Massive Multi-Discipline Lecture Understanding BenchmarkEnxin Song, Wenhao Chai, Weili Xu, Jianwen Xie, Yuxuan Liu, Gaoang Wang. 6158-6172 [doi]
- Zero-Shot Customized Video Editing with Diffusion Feature TransferWei Chen, Huidong Liu, Yang Liu, Chien-Chih Wang, Moyan Li, Hongdong Li, Bryan Wang. 6173-6182 [doi]
- BLIP-3: A Family of Open Large Multimodal ModelsLe Xue, Manli Shu, Anas Awadalla, Jun Wang, an Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai 0002, Michael S. Ryoo, Shrikant Kendre, Jieyu Zhang, Shao-Yen Tseng, Gustavo A. Lujan-Moreno, Matthew L. Olson, Musashi Hinck, David Cobbley, Vasudev Lal, Can Qin, Shu Zhang 0007, Chia-Chih Chen, Ning Yu 0006, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang 0016, Yejin Choi 0001, Ludwig Schmidt, Zeyuan Chen 0001, Silvio Savarese, Juan Carlos Niebles, Caiming Xiong, Ran Xu 0001. 6183-6194 [doi]
- Zero-Shot Subject-Centric Generation for Creative Application Using Entropy FusionKaifeng Zou, Xiaoyi Feng, Tao Huang, Zizhou Huang, Haihang Zhang, Yuntao Zou, Dagang Li. 6195-6204 [doi]
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control is Easier than You ThinkLiang Chen 0024, Shuai Bai, Wenhao Chai, Weichu Xie, Haozhe Zhao, Leon Vinci, Junyang Lin, Baobao Chang. 6205-6215 [doi]
- CST Anti-UAV: A Thermal Infrared Benchmark for Tiny UAV Tracking in Complex ScenesBin Xie, Congxuan Zhang, Fagan Wang, Peng Liu 0024, Feng Lu, Zhen Chen 0004, Weiming Hu 0004. 6216-6225 [doi]
- Probing the Representational Power of Sparse Autoencoders in Vision ModelsMatthew L. Olson, Musashi Hinck, Neale Ratzlaff, Changbai Li, Phillip Howard, Vasudev Lal, Shao-Yen Tseng. 6226-6236 [doi]
- M3DocVQA: Multi-Modal Multi-Page Multi-Document UnderstandingJaemin Cho 0001, Debanjan Mahata, Ozan Irsoy, Yujie He 0003, Mohit Bansal. 6237-6247 [doi]
- From Flat to Round: Redefining Brain Decoding with Surface-Based fMRI and Cortex StructureSijin Yu, Zijiao Chen, Wenxuan Wu, Shengxian Chen, Zhongliang Liu, Jingxin Nie, Xiaofen Xing, Xiangmin Xu, Xin Zhang 0013. 6248-6257 [doi]
- Debias Your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute SteeringNeale Ratzlaff, Matthew Lyle Olson, Musashi Hinck, Shao-Yen Tseng, Vasudev Lal, Phillip Howard. 6258-6267 [doi]
- DEARLi: Decoupled Enhancement of Recognition and Localization for Semi-Supervised Panoptic SegmentationIvan Martinovic, Josip Saric, Marin Orsic, Matej Kristan, Sinisa Segvic. 6268-6279 [doi]
- Multi-Objective Optimization for Deep Neural Network CalibrationDexter Neo, Tsuhan Chen. 6280-6291 [doi]
- SCRAMBLe: Enhancing Multimodal LLM Compositionality with Synthetic Preference DataSamarth Mishra, Kate Saenko, Venkatesh Saligrama. 6292-6302 [doi]
- BadPatch: Diffusion-Based Generation of Physical Adversarial PatchesZhixiang Wang, Xingjun Ma, Yu-Gang Jiang. 6303-6313 [doi]
- AdCorDA: Classifier Refinement via Adversarial Correction and Domain AdaptationLulan Shen, Ali Edalati, Xiangyu Li, Brett H. Meyer, Warren J. Gross, James J. Clark. 6314-6323 [doi]
- ORXE: Orchestrating Experts for Dynamically Configurable EfficiencyQingyuan Wang 0002, Guoxin Wang 0003, Barry Cardiff, Deepu John. 6324-6334 [doi]
- SynBalance: Harnessing Synthetic Data in Long-tailed RecognitionZhongyu Jiang, Jiarui Cai, Chang Liu, Dongsheng An, Jonathan Wu. 6335-6344 [doi]
- CAFE: Unifying Representation and Generation with Contrastive-Autoregressive FinetuningHao Yu, Zhuokai Zhao, Shen Yan 0007, Lukasz Korycki, Jianyu Wang, Baosheng He, Jiayi Liu, Lizhu Zhang, Xiangjun Fan, Hanchao Yu. 6345-6356 [doi]
- GeoDiff: Geometry-Guided Diffusion for Metric Depth EstimationTuan Pham, Thanh Tung Le, Xiaohui Xie, Stephan Mandt. 6357-6367 [doi]
- Data Leakage in Visual DatasetsPatrick Ramos, Ryan Ramos, Noa Garcia. 6368-6378 [doi]
- IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal RetrievalBangwei Liu, Yicheng Bao, Shaohui Lin, Xuhong Wang, Xin Tan 0002, Yingchun Wang, Yuan Xie 0006, Chaochao Lu. 6379-6388 [doi]
- MIDAS: Modeling Ground-Truth Distributions with Dark Knowledge for Domain Generalized Stereo MatchingPeng Xu 0026, Zhiyu Xiang, Jingyun Fu, Tianyu Pu, Hanzhi Zhong, Eryun Liu. 6389-6399 [doi]
- OpenInsGaussian: Open-Vocabulary Instance Gaussian Segmentation with Context-Aware Cross-View FusionTianyu Huang, Runnan Chen, Dongting Hu, Fengming Huang, Mingming Gong, Tongliang Liu. 6400-6409 [doi]
- Scaling Open-Vocabulary Action DetectionZhen Hao Sia, Yogesh Singh Rawat. 6410-6420 [doi]
- CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image GenerationMasud Ahmed, Zahid Hasan 0001, Syed Arefinul Haque, Abu Zaher Md Faridee, Sanjay Purushotham, Suya You, Nirmalya Roy. 6421-6430 [doi]
- NAS Just Once: Neural Architecture Search for Joint Image-Video RecognitionSofia Casarin, Sergio Escalera, Oswald Lanz. 6431-6441 [doi]
- DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar RelightingZeren Jiang, Shaofei Wang 0001, Siyu Tang 0001. 6442-6453 [doi]
- SPIE: Semantic and Structural Post-Training of Image Editing Diffusion Models with AI FeedbackElior Benarous, Yilun Du, Heng Yang. 6454-6466 [doi]
- LOCAL: Latent Orthonormal Contrastive Learning for Paired Image ClassificationFei Dou, Jin Lu 0001, Tan Zhu, Jinbo Bi. 6467-6476 [doi]
- Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled SamplingSubin Kim, Seoung Wug Oh, Jui-Hsien Wang, Joon-Young Lee, Jinwoo Shin. 6477-6488 [doi]
- 3D Gaussian Representations with Motion Trajectory Field for Monocular Dynamic Scene ReconstructionXuesong Li 0001, Lars Petersson, Vivien Rolland. 6489-6498 [doi]
- Improving Viewpoint Consistency in 3D Generation via Structure Feature and CLIP GuidanceQing Zhang, Jinguang Tong, Jing Zhang, Jie Hong, Xuesong Li. 6499-6508 [doi]
- Articulated Object Understanding from a Single Video SequenceArslan Artykov, Clémentin Boittiaux, Vincent Lepetit. 6509-6518 [doi]
- Temporal Multimodal Memory Banks for Agentic ReasoningPrasanth Yadla. 6519-6526 [doi]
- Collaborative Multimodal Agent Networks: Dynamic Specialization and Emergent Communication for Complex Scene UnderstandingPrasanth Yadla. 6527-6534 [doi]
- How Well do Vision-Language Models Understand Cities? A Comparative Study on Spatial Reasoning from Street-View ImagesJuneyoung Ro, Namwoo Kim, Yoonjin Yoon. 6535-6544 [doi]
- Multimodal Dual-domain Learning for Image FusionHeng Wang, Mingxin Jin, Cong Wang, Yuan Yuan. 6545-6554 [doi]
- Multimodal Monocular Geometric Clues-Informed Multi-View Stereo NetworkWanjuan SuI, Shili Xiong. 6555-6564 [doi]
- Test-time Prompt Refinement for Text-to-Image ModelsMohammed Abdul Hafeez Khan, Yash Jain, Siddhartha Bhattacharyya, Vibhav Vineet. 6565-6575 [doi]
- MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and OutlookPeng Xu, Shengwu Xiong 0001, Jiajun Zhang, Yaxiong Chen, Bowen Zhou 0002, Chen Change Loy, David A. Clifton, Kyoung Mu Lee, Luc Van Gool, Ruiming He, Ruilin Yao, Xinwei Long, Jirui Huang, Kai Tian, Sa Yangbcd, Yihua Shao, Jin Feng, Yue Zhong, Jiakai Zhou, Cheng Tang, Tianyu Zou, Yifang Zhang, Junming Liang, Guoyou Li, Zhaoxiang Wang, Qiang Zhou, Yichen Zhao, Shili Xiong, Hyeongjin Nam, Jaerin Lee, Jaeyoung Chung, Joonkyu Park, Junghun Oh, Kanggeon Lee, Wooseok Lee, Juneyoung Ro, Turghun Osman, Can Hu, Chaoyang Liao, Cheng Chen, Chengcheng Han 0006, Chenhao Qiu, Chong Peng, Cong Xu, Dailin Li, Feiyu Wang, Feng Gao, Guibo Zhu, Guopeng Tang, Haibo Lu, Han Fang, Han Qi, Hanxiao Wu, Haobo Cheng, Hongbo Sun, Hongyao Chen, Huayong Hu, Hui Lid, Jiaheng Ma, Jiang Yu, Jianing Wang, Jie Yang, Jing He, Jinglin Zhou, Jingxuan Li, Josef Kittler, Lihao Zheng, Linnan Zhao, Mengxi Jia, Muyang Yan, Nguyen Thanh Thien, Pu Luo, Qi Li, Shien Song, Shijie Dong, Shuai Shao, Shutao Li, Taofeng Xue, Tianyang Xu, Tianyi Gao, Tingting Li, Wei Zhang, Weiyang Su, Xiaodong Dong, Xiao-jun Wu, Xiaopeng Zhou, Xin Chen, Xin Wei, Xinyi You, Xudong Kang, Xujie Zhou, Xusheng Liu, Yanan Wang, Yanbin Huang, Yang Liu, Yang Yang, Yanglin Deng, Yashu Kang, Ye Yuan, Yi Wen, Yicen Tian, Yilin Tao, Yin Tang, Yipeng Lin, Yiqing Wang, Yiting Xi, Yongkang Yu, Yumei Li, Yuxin Qin, Yuying Chen, Yuzhe Cen, Zhaofan Zou, Zhaohong Liu, Zhehao Shen, Zhenglin Du, Zhengyang Li, Zhenni Huang, Zhenwei Shao, Zhilong Song, Zhiyong Feng, Zhiyu Wang, Zhou Yu, Ziang Li, Zihan Zhai, Zijian Zhang, Ziyang Peng, Ziyun Xiao, Zongshu Li. 6576-6605 [doi]
- A Multimodal Physics-Informed Neural Network Approach for Mean Radiant Temperature ModelingPouya Shaeri, Saud AlKhaled, Ariane Middel. 6606-6615 [doi]
- Generating Tennis Action Instruction Based on a Large Language ModelZhiyuan Wang, Zhiqian Xia, Guangyao Zhao, Kaiyue Lu, Haifeng Xia, Siyu Xia. 6616-6625 [doi]
- A Generalized Two-Stage Approach to Motion Style TransferTong Guo, Zhiqian Xia, Feng Yu, Haifeng Xia, Siyu Xia. 6626-6635 [doi]
- FreqCross: A Multi-Modal Frequency-Spatial Fusion Network for Robust Detection of Stable Diffusion 3.5 Generated ImagesGuang Yang. 6636-6643 [doi]
- Fitting Image Diffusion Models on Video DatasetsJuhun Lee, Simon S. Woo. 6644-6650 [doi]
- High-Fidelity Character Animation: Generating Coherent and Controllable Motion Videos from Static ImagesYongming Huang, Zhuojun Xia, Can Bu. 6651-6660 [doi]
- Attend and Replay: Efficient Action Understanding in Long Videos via Mechanistic InterpretabilityPuyue Hou, Jinjin Zhang, Di Huang. 6661-6670 [doi]
- Foundational Multi-Task Multimodal Model for Upper GI EndoscopyYuxuan He, Qilei Chen, Benyuan Liu, Yu Cao. 6671-6680 [doi]
- SignLLM: Sign Language Production Large Language ModelsSen Fang, Chen Chen 0001, Lei Wang 0108, Ce Zheng, Chunyu Sui, Yapeng Tian. 6681-6693 [doi]
- The Escalator Problem: Identifying Implicit Motion Blindness in AI for AccessibilityXiantao Zhang. 6694-6702 [doi]
- Who's Asking? Investigating Bias Through the Lens of Disability-Framed Queries in LLMsVishnu Hari, Kalpana Panda, Srikant Panda, Amit Agarwal, Hitesh Laxmichand Patel. 6703-6714 [doi]
- RampNet: A Two-Stage Pipeline for Bootstrapping Curb Ramp Detection in Streetscape Images from Open Government MetadataJohn S. O'Meara, Jared Hwang, Zeyu Wang, Michael Saugstad, Jon E. Froehlich. 6715-6724 [doi]
- VisualSpeaker: Visually-Guided 3D Avatar Lip SynthesisAlexandre Symeonidis-Herzig, Ozge Mercanoglu Sincan, Richard Bowden. 6725-6734 [doi]
- Seeing in 2D, Thinking in 3D: 3D Hand Mesh-Guided Feature Learning for Continuous FingerspellingKaterina Papadimitriou, Panayiotis Paraskevas Filntisis, George Retsinas, Gerasimos Potamianos, Petros Maragos. 6735-6744 [doi]
- Automated Context-Aware Navigation Support for Individuals with Visual Impairment Using Multimodal Language Models in Urban EnvironmentsAlton Chao, Erika Maquiling, Edmund Chao, Roshan Sanjeev, Tonko E. W. Bossen, Ross Greer. 6745-6752 [doi]
- Introduction to the First Workshop on Vision Foundation Models and Generative AI for AccessibilityYapeng Tian, Yuhang Zhao 0001, Jon E. Froehlich, Chu Li, Yuheng Wu. 6753-6762 [doi]
- T3D: Advancing 3D Medical Vision-Language Pre-Training by Learning Multi-View Visual ConsistencyChe Liu 0002, Cheng Ouyang, Yinda Chen, César Quilodrán Casas, Lei Ma 0008, Jie Fu 0001, Yike Guo, Anand Shah, Wenjia Bai, Rossella Arcucci. 6763-6773 [doi]
- GTGM: Generative Text-Guided 3D Vision-Language Pretraining for Medical Image SegmentationYinda Chen, Che Liu, Wei Huang, Xiaoyu Liu 0006, Haoyuan Shi, Sibo Cheng, Rossella Arcucci, Zhiwei Xiong. 6774-6783 [doi]
- Unified Supervision for Vision-Language Modeling in 3D Computed TomographyHao-Chih Lee, Zelong Liu, Hamza Ahmed, Spencer Kim, Sean Huver, Vishwesh Nath, Zahi A. Fayad, Timothy Deyer, XueYan Mei. 6784-6792 [doi]
- Prediction Degeneracy in Medical Vision-Language Models: Implications for Robustness and InterpretabilityMartin Goetze, Dennis Eschweiler, Brendan Huang, Gustav Müller-Franzes, Carolina Ramirez, Madeline Hess, Lavinia Goldermann, Sharmila Majumdar, Daniel Truhn. 6793-6800 [doi]
- RadZero3D: Bridging Self-Supervised Video Models and Medical Vision-Language Alignment for Zero-Shot Chest CT InterpretationJonggwon Park, Kyoyun Choi, Byungmu Yoon, Hong Geun Cho, Bumcheol Hwang. 6801-6808 [doi]
- CTFlow: Video-Inspired Latent Flow Matching for 3D CT SynthesisJiayi Wang, Hadrien Reynaud, Franciskus Xaverius Erick, Bernhard Kainz. 6809-6817 [doi]
- Foundation Models for Multimodal MRI Synthesis with Language GuidanceMahmut Yurt, Xiaozhi Cao, Zihan Zhou, Kawin Setsompop, Shreyas Vasanawala, John M. Pauly. 6818-6823 [doi]
- Patch-Wise Intensity Mapping for Individualized Brain Abnormality Detection in Alzheimer's DiseaseYangshuang Xu, Rongjie Liu, Chao Huang. 6824-6833 [doi]
- CT-GRAPH: Hierarchical Graph Attention Network for Anatomy-Guided CT Report GenerationHamza Kalisch, Fabian Hörst, Jens Kleesiek, Ken Herrmann, Constantin Seibold. 6834-6843 [doi]
- Preprocessing-Architecture Co-Design for Multi-Label Classification in 3D Chest CT ScansJung Jun Ah, JuHui Lee, Jihye Heo, Dongheon Lee. 6844-6852 [doi]
- Task-Specific Generative Dataset Distillation with Difficulty-Guided SamplingMingzhuo Li, Guang Li 0008, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa 0001, Miki Haseyama. 6853-6860 [doi]
- LG-Traj: LLM Guided Pedestrian Trajectory PredictionPranav Singh Chib, Pravendra Singh. 6861-6871 [doi]
- Objaverse++: Curated 3D Object Dataset with Quality AnnotationsChendi Lin, Heshan Liu, Qunshu Lin, Zachary Bright, Shitao Tang, Yihui He, Minghao Liu, Ling Zhu, Cindy X. Le. 6872-6881 [doi]
- How to Train Your Text-to-Image Model: Evaluating Design Choices for Synthetic Training CaptionsManuel Brack, Sudeep Katakol, Felix Friedrich, Patrick Schramowski, Hareesh Ravi, Kristian Kersting, Ajinkya Kale. 6882-6891 [doi]
- VolDoGER: LLM-Assisted Datasets for Domain Generalization in Vision-Language TasksJuhwan Choi, Junehyoung Kwon, Jungmin Yun, Seunguk Yu, Youngbin Kim. 6892-6902 [doi]
- DISC-GAN: Disentangling Style and Content for Cluster-Specific Synthetic Underwater Image GenerationSneha Varur, Anirudh R. Hanchinamani, Tarun S. Bagewadi, Uma Mudenagudi, Chaitra Desai, Sujata C, Padmashree Desai, Sumit Meharwade. 6903-6911 [doi]
- Efficient Learning for Product Attributes with Compact Multimodal ModelsMandar Kulkarni. 6912-6918 [doi]
- SYM3D: Canonicalizing Triplanes via Symmetry for Single-View 3D LearningJing Yang, Kyle Fogarty, Fangcheng Zhong, A. Cengiz Öztireli. 6919-6927 [doi]
- Class-Proportional Coreset Selection for Difficulty-Separable DataElisa Tsai, Haizhong Zheng, Atul Prakash 0001. 6928-6937 [doi]
- Federated Active Learning for Target Domain GeneralisationRazvan Caramalau, Binod Bhattarai, Danail Stoyanov. 6938-6947 [doi]
- Synthetic Captions for Open-Vocabulary Zero-Shot SegmentationTim Lebailly, Vijay Veerabadran, Satwik Kottur, Karl Ridgeway, Michael Louis Iuzzolino. 6948-6959 [doi]
- Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable RewardsAybora Köksal, A. Aydin Alatan. 6960-6969 [doi]
- OmViD: Omni-Supervised Active Learning for Video Action DetectionAayush Jung Rana, Akash Kumar 0016, Vibhav Vineet, Yogesh S. Rawat. 6970-6980 [doi]
- MINDiff: Mask-Integrated Negative Attention for Controlling Overfitting in Text-to-Image PersonalizationSeulgi Jeong, Jaeil Kim. 6981-6990 [doi]
- Locally Controlled Face Aging with Latent Diffusion ModelsLais Isabelle Alves dos Santos, Julien Despois, Thibaut Chauffier, Sileye O. Ba, Giovanni Palma. 6991-6999 [doi]
- Only-Style: Stylistic Consistency in Image Generation without Content LeakageTilemachos Aravanis, Panayiotis Paraskevas Filntisis, Petros Maragos, George Retsinas. 7000-7010 [doi]
- Target Attribute Diffusion ModelsWilliam Loh, Yanting Miao, Pascal Poupart, Suraj Kothawade. 7011-7019 [doi]
- Dual Orthogonal Guidance for Robust Diffusion-Based Handwritten Text GenerationKonstantina Nikolaidou, George Retsinas, Giorgos Sfikas, Silvia Cascianelli, Rita Cucchiara, Marcus Liwicki. 7020-7029 [doi]
- UP-VTON: A Unified Virtual Try-On Framework Supporting Mask, Mask-Free, and Prompt-Driven GuidanceYoungjoo Jo, Minho Park 0002, Dong-Oh Kang. 7030-7038 [doi]
- Tight Inversion: Image-Conditioned Inversion for Real Image EditingEdo Kadosh, Nir Goren, Or Patashnik, Daniel Garibi, Daniel Cohen-Or. 7039-7049 [doi]
- MotionMatcher: Cinematic Motion Customization of Text-to-Video Diffusion Models via Motion Feature MatchingYen-Siang Wu, Chi-Pin Huang, Fu-En Yang, Yu-Chiang Frank Wang. 7050-7060 [doi]
- Human Preference-Aligned Concept Customization Benchmark via Decomposed EvaluationReina Ishikawa, Ryo Fujii, Hideo Saito 0001, Ryo Hachiuma. 7061-7070 [doi]
- Reinforcement Learning Meets Masked Video Modeling : Trajectory-Guided Adaptive Token SelectionAyush K. Rai, Kyle Min 0001, Tarun Krishna, Feiyan Hu, Alan F. Smeaton, Noel E. O'Connor. 7071-7081 [doi]
- A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic QualityMohamed Elmoghany, Ryan A. Rossi, Seunghyun Yoon 0002, Subhojyoti Mukherjee, Eslam Mohamed Bakr, Puneet Mathur, Gang Wu 0013, Viet Dac Lai, Nedim Lipka, Ruiyi Zhang, Varun Manjunatha, Chien Nguyen, Daksh Dangi, Abel Salinas, Hongjie Chen 0003, Xiaolei Huang 0002, Joe Barrow, Nesreen K. Ahmed, Hoda Eldardiry, Namyong Park 0001, Yu Wang 0160, Zhengzhong Tu, Thien Huu Nguyen, Dinesh Manocha, Mohamed Elhoseiny, Franck Dernoncourt. 7082-7094 [doi]
- Forgetting-Free Incremental Panoptic Lifting by Maximum-Visibility Viewpoint SelectionAkira Kohjin, Motoharu Sonogashira, Masaaki Iiyama, Yasutomo Kawanishi. 7095-7104 [doi]
- SELDOM: Scene Editing via Latent Diffusion with Object-Centric ModificationsRichard E. L. Higgins, David F. Fouhey. 7105-7117 [doi]
- Globally Optimal Registration of Dense Terrestrial Laser Scans from Coarse SamplingDorian Kempf, Guillaume Caron, El Mustapha Mouaddib, Fumio Kanehiro. 7118-7127 [doi]
- PlantDreamer: Achieving Realistic 3D Plant Models with Diffusion-Guided Gaussian SplattingZane K. J. Hartley, Lewis A G. Stuart, Andrew P. French, Michael P. Pound. 7128-7138 [doi]
- AgMIC: Agricultural Masked Image Consistency for Cross-Domain SegmentationMuhib Ullah, Nisar Ali, Numair Nadeem, Abdul Bais. 7139-7149 [doi]
- Dynamic Monitoring of Crop Canopies Using Time-Series Point Clouds: Insights into Phenotypic Variation and Leaf-Level Photosynthetic PerformanceJiaren Zhou, Mengqi Zhang 0005, Shulin Sun, Man Zhang 0003, Minjuan Wang. 7150-7160 [doi]
- First Place Solution to the MLCAS 2025 GWFSS Challenge: The Devil is in the Detail and MinoritySongliang Cao, Tianqi Hu, Hao Lu. 7161-7168 [doi]
- High-Throughput Estimation of Photosynthetic Phenotypic Parameters Using Hyperspectral DataMengqi Zhang 0005, Jiaren Zhou, Man Zhang 0003, Minjuan Wang. 7169-7177 [doi]
- Comparative Analysis of Image-Based Deep Learning and Genomic Models for Yield and Protein Content Prediction in Winter WheatXiaoran Chen, Paraskevi Nousi, Mike Boss, Michele Volpi, Lukas Roth. 7178-7186 [doi]
- A Case for the Use of Chroma Cartesian Colour Representations for Image Classification on Plant-Based DomainsAlexis J. S. Payne, Gail Hopkins, Shreyank N. Gowda, Isaac Trigucro, Michael P. Pound. 7187-7196 [doi]
- Modeling Time-Lapse Trajectories to Characterize Cranberry GrowthJohn Ronan, Anis Chihoub, Ryan Meegan, Gina Sidelli, Jeffery Neyhart, Peter Oudemans, Kristin J. Dana. 7208-7218 [doi]
- Improving Lightweight Weed Detection via Knowledge DistillationAhmet Oguz Saltik, Max Voigt, Sourav Modak, Mike Beckworth, Anthony Stein. 7219-7228 [doi]
- Multimodal Fusion of X-Ray Transmission and Dark Field Imaging for Apple Internal Disorders DetectionJiaqi He, Astrid Tempelaere, Janne Vignero, Pieter Verboven, Bart M. Nicolaï. 7229-7238 [doi]
- WeedSense: Multi-Task Learning for Weed Segmentation, Height Estimation, and Growth Stage ClassificationToqi Tahamid Sarker, Khaled R. Ahmed, Taminul Islam, Cristiana Bernardi Rankrape, Karla Gage. 7239-7249 [doi]
- DyTact: Capturing Dynamic Contacts in Hand-Object ManipulationXiaoyan Cong, Angela Xing, Chandradeep Pokhariya, Rao Fu 0003, Srinath Sridhar 0002. 7250-7262 [doi]
- HOSt3R: Keypoint-Free Hand-Object 3D Reconstruction from RGB ImagesAnilkumar Swamy, Vincent Leroy 0003, Philippe Weinzaepfel, Jean-Sébastien Franco, Grégory Rogez. 7263-7272 [doi]
- WACU: Multi-Modal Wristband Assistant for Contextual UnderstandingConstantin Patsch, Jaden Goter, Joseph Greer, Lingni Ma, Raj Sodhi. 7273-7282 [doi]
- PixCuboid: Room Layout Estimation from Multi-View Featuremetric AlignmentGustav Hanning, Kalle Åström, Viktor Larsson. 7283-7292 [doi]
- The Overlooked Value of Test-Time Reference Sets in Visual Place RecognitionMubariz Zaffar, Liangliang Nan, Sebastian Scherer, Julian F. P. Kooij. 7293-7302 [doi]
- LightGlueStick: A Fast and Robust Glue for Joint Point-Line MatchingAidyn Ubingazhibov, Rémi Pautrat, Iago Suárez, Shaohui Liu, Marc Pollefeys, Viktor Larsson. 7303-7313 [doi]
- Triangulation of 3D Target Points from Radar Range and Bearing DataMagnus Oskarsson. 7314-7323 [doi]
- PHAROS-AFE-AIMI: Multi-Source & Fair Disease DiagnosisDimitrios Kollias, Anastasios Arsenos, Stefanos D. Kollias. 7324-7332 [doi]
- Proactive HIV Care: AI-Based Comorbidity Prediction from Routine EHR DataSolomon Russom, Dimitrios Kollias, Qianni Zhang. 7333-7341 [doi]
- Med-VLM: Enhancing Medical Image Segmentation Accuracy Through Vision-Language ModelYihao Zhao, Enhao Zhong, Cuiyun Yuan, Yang Li, Man Zhao, Chunxia Li, Jun Hu, Wei Liu, Chenbin Liu. 7342-7352 [doi]
- Dopamine-Inspired Neuro-Modulated Radiomic Pipeline for Computed-Tomography-Driven Prediction of Immunotherapy Response in Metastatic Cancer TreatmentFrancesco Rundo, Massimo Orazio Spata, Carmelo Pino, Sebastiano Battiato. 7353-7362 [doi]
- Multi-Source Covid-19 Detection via Variance Risk ExtrapolationRuntian Yuan, Qingqiu Li, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng 0001, Hao Chen 0011. 7363-7370 [doi]
- In-Hoc Concept Representations to Regularise Deep Learning in Medical ImagingValentina Corbetta, Floris Six Dijkstra, Regina Beets-Tan, Hoel Kervadec, Kristoffer Wickstrøm, Wilson Silva. 7371-7380 [doi]
- Quantifying Inter-Operator Variability and its Causes for Medical Semantic SegmentationSophie Fischer, Irina Voiculescu. 7381-7389 [doi]
- Stable-Drift: A Patient-Aware Latent Drift Replay Method for Stabilizing Representations in Continual LearningParaskevi Antonia Theofilou, Anuhya Thota, Stefanos D. Kollias, Mamatha Thota. 7390-7399 [doi]
- Fairness in Breast Cancer Diagnosis with Deep LearningLaura Mata Le Bot, Sze Chai Kwok, Emmanuel Amankwaa-Frempong, Kofi Appiah. 7400-7406 [doi]
- ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health ClassificationAlvaro Lopez Pellicer, Andre Mariucci, Plamen Angelov 0001, Marwan Bukhari, Jemma G. Kerns. 7407-7416 [doi]
- DeepSeg-MS: A 3D Network Based on Hierarchical Multi-Scale Learning for MRI Multiple Sclerosis Lesion SegmentationAlessia Rondinella, Francesco Guarnera, Francesco Rundo, Sebastiano Battiato. 7417-7426 [doi]
- Advancing Lung Disease Diagnosis in 3D CT ScansQingqiu Li, Runtian Yuan, Junlin Hou, Jilan Xu, Yuejie Zhang, Rui Feng, Hao Chen 0011. 7427-7432 [doi]
- Lighten-MST: Low-Light Spectral Reconstruction via Illumination-Guided Spectral-Aware TransformerYanan Hu, Xiaodong Wang, Zijun He, Xin Yuan. 7433-7439 [doi]
- See the Invisible with SWIR: Learning to Enhance via Synthetic Noise ModelingHaiyang Jiang 0002, Hongjun Wang, Yinqiang Zheng. 7440-7449 [doi]
- Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World ScenariosChenglu Pan, Xiaogang Xu 0009, Ganggui Ding, Yunke Zhang, Wenbo Li, Jiarong Xu, Qingbiao Wu. 7450-7459 [doi]
- Learning to See Through FlareXiaopeng Peng 0001, Heath Gemar, Erin F. Fleet, Kyle Novak, Abbie T. Watnik, Grover A. Swartzlander. 7460-7471 [doi]
- The Third Visual Object Tracking Segmentation VOTS2025 Challenge ResultsMatei Kristan, Jirí Matas, Pavel Tokmakov, Alan Lukezic, Michael Felsberg, Luka Cehovin Zajc, Khanh-Tung Tran, Xuan-Son Vu, Johanna Björklund, Michal Neoral, Hyung Jin Chang, Gustavo Fernández, Minasadat Attari, Matteo Dunnhofer, Wei Feng 0005, Zhenhua Feng 0001, Jin Gao, Yameng Gu, Ruize Han, Jiawei He, Zhenyu He 0001, Junhui Hou, Weiming Hu 0004, Xiantao Hu, Xingsen Huang, Yuqing Huang, Gleb Kirichenko, Josef Kittler, Yutong Kou, Simiao Lai, Bing Li, Xin Li, Shubo Lin, Huchuan Lu, Deshui Miao, Christian Micheloni, Juan David Mogollon, Moritz Nottebaum, Kannappan Palaniappan, Ziqi Pang, Zekun Qian, Gani Rahmon, Aleksandr Romanov 0001, Liangtao Shi, Roman A. Solovyev, Elham Soltani Kazemi, Imad Eddine Toubal, Jovana Videnovic, Dong Wang, Yaowei Wang, Yu-Xiong Wang, Zhixiang Wang, Xiaojun Wu 0001, Jinxia Xie, Tianyang Xu 0001, Chaocan Xue, Yuanliang Xue, Ming-Hsuan Yang 0001, Dmitriy Yurtov, Chunui Zhang, Xiangqun Zhang, Yunfei Zhang, Qingfang Zheng, Bineng Zhong 0001, Fuan Zhong, Jinglin Zhou, Jingmeng Zhou, Junbao Zhou, Yong Zhou, Xuefeng Zhu 0003. 7472-7490 [doi]
- Exploring Spatial-Temporal Dynamics in Event-Based Facial Micro-Expression AnalysisNicolas Mastropasqua, Ignacio G. Bugueño-Córdova, Rodrigo Verschae, Daniel G. Acevedo, Pablo Negri, María E. Buemi. 7482-7491 [doi]
- Text Image Generation for Low-Resource Languages with Dual Translation LearningChihiro Noguchi, Shun Fukuda, Shoichiro Mihara, Masao Yamanaka. 7491-7501 [doi]
- TAP-VL: Text Layout Aware Pretraining for Enriched Vision-Language ModelsJonathan Fhima, Elad Ben-Avraham, Oren Nuriel, Yair Kittenplon, Roy Ganz, Aviad Aberdam, Ron Litman. 7502-7512 [doi]
- Structure-Aware Contrastive Learning for Diagram Understanding of Multimodal ModelsHiroshi Sasaki. 7513-7522 [doi]
- Quo Vadis Handwritten Text Generation for Handwritten Text Recognition?Vittorio Pippi, Konstantina Nikolaidou, Silvia Cascianelli, George Retsinas, Giorgos Sfikas, Rita Cucchiara, Marcus Liwicki. 7523-7533 [doi]
- Describe Anything Model for Visual Question Answering on Text-Rich ImagesYen-Linh Vu, Dinh-Thang Duong, Truong-Binh Duong, Anh-Khoi Nguyen, Thanh Huy Nguyen, Le Thien Phuc Nguyen, Jianhua Xing, Xingjian Li 0002, Tianyang Wang 0004, Ulas Bagci, Min Xu 0009. 7534-7544 [doi]
- ZOD: Zero-Shot and Out-of-Distribution Detection Dataset for Document ImagesTalha Uddin Sheikh, Sankalp Sinha, Shino Sam, Didier Stricker, Muhammad Zeshan Afzal. 7545-7555 [doi]
- CoSMo: A Multimodal Transformer for Page Stream Segmentation in Comic BooksMarc Serra Ortega, Emanuele Vivoli, Artemis Llabrés, Dimosthenis Karatzas. 7556-7564 [doi]
- Scanned Documents Forensics: Detecting Inserted Characters Through Noise and Chromatic ArtifactsMarina Gardella, Julieta Umpierrez, Antoine Tadros, Seginus Mowlavi, Natalia Bottaioli, Diego Belzarena, Gabriele Facciolo, Roy Y. He, Jean-Michel Morel, Rafael Grompone von Gioi. 7565-7575 [doi]
- PRISM: Pruning for Rank-adaptive Interpretable Segmentation Model with Application to Historical Document Multiband ImagesKilian Declercq, Abderrahmane Rahiche, Mohamed Cheriet. 7576-7585 [doi]
- DocSemi: Efficient Document Layout Analysis with Guided QueriesTahira Shehzadi, Ifza, Didier Stricker, Muhammad Zeshan Afzal. 7586-7596 [doi]
- DIVE-Doc: Downscaling Foundational Image Visual Encoder into Hierarchical Architecture for DocVQARayane Bencharef, Abderrahmane Rahiche, Mohamed Cheriet. 7597-7606 [doi]
- CTC Transcription Alignment of the Bullinger Letters: Automatic Improvement of Annotation QualityMarco Peer, Anna Scius-Bertrand, Andreas Fischer. 7607-7616 [doi]
- Deep Learning-Based Intrusion Detection Systems for Phishing Email Detection: A Short SurveyAxel De Nardin, Silvia Zottin, Claudio Piciarelli, Gian Luca Foresti. 7617-7625 [doi]
- Improved Information Extraction by Leveraging Multi-Hypothesis OCR at Inference TimeArthur Hemmer, Nicola Bartolo, Mickaël Coustaty, Jean-Marc Ogier. 7626-7634 [doi]
- A Survey on Reading Order, Table of Contents, and Structure Extraction in Document AnalysisSimone Giovannini, Simone Marinai. 7635-7644 [doi]
- ChemMiner: A Large Language Model Agent System for Chemical Literature Data MiningKexin Chen 0003, Yuyang Du 0001, Junyou Li, Hanqun Cao, Menghao Guo, Xilin Dang, Lanqing Li, Jiezhong Qiu, Guangyong Chen, Pheng-Ann Heng. 7645-7653 [doi]
- CoPa-SG: Dense Scene Graphs with Parametric and Proto-RelationsJulian Lorenz, Mrunmai Phatak, Robin Schön, Katja Ludwig, Nico Hörmann, Annemarie Friedrich, Rainer Lienhart. 7654-7663 [doi]
- Measuring Image-Relation Alignment: Reference-Free Evaluation of VLMs and Synthetic Pre-Training for Open-Vocabulary Scene Graph GenerationMaëlic Neau, Zoe Falomir, Cédric Buche, Akihiro Sugimoto. 7664-7673 [doi]
- Long-Form Reasoning for Keystep Recognition Using Graph Neural NetworksJulia Lee Romero, Kyle Min 0001, Subarna Tripathi, Morteza Karimzadeh. 7674-7683 [doi]
- An Information-Theoretic Approach to Diversity Evaluation of Prompt-Based Generative ModelsMohammad Jalali, Azim Ospanov, Amin Gohari, Farzan Farnia. 7684-7693 [doi]
- On the Distributed Evaluation of Generative ModelsZixiao Wang 0001, Farzan Farnia, Zhenghao Lin, Yunheng Shen, Bei Yu 0001. 7694-7703 [doi]
- Modeling Cognitive and Implicit Biases in Multi-Agent Medical Systems for Clinical DiagnosesNiyel Hassan, Benjamin Liu, Raghav Thallapragada, Ryan Bui, Nishant Chinta, Ayushman Bisht, Dev J. Chaliha, Kevin Zhu. 7704-7712 [doi]
- Evaluating the Impact of Racial Cues on MLLMs Judgements of Politeness and OffensivenessMahammed Kamruzzaman, Gene Louis Kim. 7713-7722 [doi]
- BAdd: Bias Mitigation Through Bias AdditionIoannis Sarridis, Christos Koutlis, Symeon Papadopoulos, Christos Diou. 7723-7732 [doi]
- Fairness without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender ClassificationHaohua Dong, Ana Manzano Rodríguez, Camille Guinaudeau, Shin'ichi Satoh 0001. 7733-7742 [doi]