Abstract is missing.
- The Third Monocular Depth Estimation ChallengeJaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell 0001, Simon Hadfield, Richard Bowden, Guangyuan Zhou, Zhengxin Li, Qiang Rao, Yiping Bao, Xiao Liu 0004, Dohyeong Kim, Jinseong Kim, MyungHyun Kim, Mykola Lavreniuk, Rui Li 0013, Qing Mao, Jiang Wu, Yu Zhu 0004, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora 0002, Pihai Sun, Kui Jiang, Gang Wu 0010, Jian Liu, Xianming Liu, Junjun Jiang, Xidan Zhang, Jianing Wei, Fangjun Wang, Zhiming Tan, Jiabao Wang, Albert Luginov, Muhammad Shahzad, Seyed-Hosseini, Aleksander Trajcevski, James H. Elder. 1-14 [doi]
- UAV-Rain1k: A Benchmark for Raindrop Removal from UAV Aerial ImageryWenhui Chang, Hongming Chen, Xin He, Xiang Chen, Liangduo Shen. 15-22 [doi]
- Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual ConditionsChuheng Wei, Guoyuan Wu 0001, Matthew J. Barth. 23-32 [doi]
- Source-Free Domain Adaptation of Weakly-Supervised Object Localization Models for HistologyAlexis Guichemerre, Soufiane Belharbi, Tsiry Mayet, Shakeeb Murtaza, Pourya Shamsolmoali, Luke McCaffrey, Eric Granger. 33-43 [doi]
- Mobile Aware Denoiser Network (MADNet) for Quad Bayer ImagesPavan C. Madhusudana, Jing Li, Zeeshan Nadir, Hamid R. Sheikh, Seok-Jun Lee. 44-52 [doi]
- VolRAFT: Volumetric Optical Flow Network for Digital Volume Correlation of Synchrotron Radiation-based Micro-CT Images of Bone-Implant InterfacesTak Ming Wong, Julian Moosmann, Berit Zeller-Plumhoff. 53-62 [doi]
- Damage Detection and Localization by Learning Deep Features of Elastic Waves in Piezoelectric Ceramic Using Point Contact MethodPragyan Banerjee, Pranjal Saxena, Nur M. M. Kalimullah, Amit Shelke, Anowarul Habib. 63-70 [doi]
- Self-Supervised Learning with Generative Adversarial Networks for Electron MicroscopyBashir Kazimi, Karina Ruzaeva, Stefan Sandfeld. 71-81 [doi]
- Towards Explainable Visual Vessel Recognition Using Fine-Grained Classification and Image RetrievalHeiko Karus, Friedhelm Schwenker, Michael Munz 0001, Michael Teutsch. 82-92 [doi]
- Towards Efficient Machine Unlearning with Data Augmentation: Guided Loss-Increasing (GLI) to Prevent the Catastrophic Model Utility DropDasol Choi, Soora Choi, Eunsun Lee, Jinwoo Seo, Dongbin Na. 93-102 [doi]
- Enforcing Conditional Independence for Fair Representation Learning and Causal Image GenerationJensen Hwa, Qingyu Zhao, Aditya Lahiri, Adnan Masood, Babak Salimi, Ehsan Adeli 0001. 103-112 [doi]
- Improving the Robustness of 3D Human Pose Estimation: A Benchmark Dataset and Learning from Noisy InputTrung-Hieu Hoang, Mona Zehni, Huy Phan, Duc Minh Vo, Minh N. Do. 113-123 [doi]
- DIA: Diffusion based Inverse Network Attack on Collaborative InferenceDake Chen, Shiduo Li, Yuke Zhang, Chenghao Li, Souvik Kundu 0002, Peter A. Beerel. 124-130 [doi]
- ReweightOOD: Loss Reweighting for Distance-based OOD DetectionSudarshan Regmi, Bibek Panthi, Yifei Ming, Prashnna K. Gyawali, Danail Stoyanov, Binod Bhattarai. 131-141 [doi]
- Our Deep CNN Face Matchers Have Developed AchromatopsiaAman Bhatta, Domingo Mery, Haiyu Wu, Joyce Annan, Michael C. King, Kevin W. Bowyer. 142-152 [doi]
- T2FNorm: Train-time Feature Normalization for OOD Detection in Image ClassificationSudarshan Regmi, Bibek Panthi, Sakar Dotel, Prashnna K. Gyawali, Danail Stoyanov, Binod Bhattarai. 153-162 [doi]
- Fractals as Pre-training Datasets for Anomaly Detection and LocalizationCynthia Ifeyinwa Ugwu, Sofia Casarin, Oswald Lanz. 163-172 [doi]
- Test-time Assessment of a Model's Performance on Unseen Domains via Optimal TransportAkshay Mehra, Yunbei Zhang, Jihun Hamm. 173-182 [doi]
- Robust and Explainable Fine-Grained Visual Classification with Transfer Learning: A Dual-Carriageway FrameworkZheming Zuo, Joseph Smith, Jonathan Stonehouse, Boguslaw Obara. 183-193 [doi]
- Practical Region-level Attack against Segment Anything ModelsYifan Shen, Zhengyuan Li, Gang Wang. 194-203 [doi]
- SkipPLUS: Skip the First Few Layers to Better Explain Vision TransformersFaridoun Mehri, Mohsen Fayyaz, Mahdieh Soleymani Baghshah, Mohammad Taher Pilehvar. 204-215 [doi]
- AR-CP: Uncertainty-Aware Perception in Adverse Conditions with Conformal Prediction and Augmented Reality For Assisted DrivingAchref Doula, Max Mühlhäuser, Alejandro Sánchez Guinea. 216-226 [doi]
- Fast-NTK: Parameter-Efficient Unlearning for Large-Scale ModelsGuihong Li, Hsiang Hsu, Chun-Fu Richard Chen, Radu Marculescu. 227-234 [doi]
- Mitigating Bias Using Model-Agnostic Data AttributionSander De Coninck, Sam Leroux, Pieter Simoens. 235-243 [doi]
- RLNet: Robust Linearized Networks for Efficient Private InferenceSreetama Sarkar, Souvik Kundu 0002, Peter A. Beerel. 244-253 [doi]
- Data-free Defense of Black Box Models Against Adversarial AttacksGaurav Kumar Nayak, Inder Khatri, Ruchit Rawal, Anirban Chakraborty 0001. 254-263 [doi]
- An End-to-End Approach for Handwriting Recognition: From Handwritten Text Lines to Complete PagesDayvid Castro, Byron Leite Dantas Bezerra, Cleber Zanchettin. 264-273 [doi]
- Enhancing Image Classification Robustness through Adversarial Sampling with Delta Data Augmentation (DDA)Iván Reyes-Amezcua, Gilberto Ochoa-Ruiz, Andres Mendez-Vazquez. 274-283 [doi]
- High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layersLuiz Schirmer, Guilherme G. Schardong, Vinícius da Silva, Rogério Santos, Hélio Lopes 0001. 284-292 [doi]
- Beyond Appearances: Material Segmentation with Embedded Spectral Information from RGB-D imageryFabian Perez, Hoover Rueda-Chacon. 293-301 [doi]
- ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videosMaria Luísa Lima, Willams de Lima Costa, Estefania Talavera Martínez, Veronica Teichrieb. 302-310 [doi]
- The Myth of the PyramidRamon Izquierdo-Cordova, Walterio W. Mayol-Cuevas. 311-321 [doi]
- GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective ComputingHao Lu 0009, Xuesong Niu, Jiyao Wang, Yin Wang, Qingyong Hu, Jiaqi Tang 0005, Yuting Zhang, Kaishen Yuan, Bin Huang, Zitong Yu, Dengbo He, ShuiGuang Deng, Hao Chen, Yingcong Chen, Shiguang Shan. 322-331 [doi]
- NurtureNet: A Multi-task Video-based Approach for Newborn AnthropometryYash Khandelwal, Mayur Arvind, Sriram Kumar, Ashish Gupta, Sachin Kumar Danisetty, Piyush Bagad, Anish Madan, Mayank Lunayach, Aditya Annavajjala, Abhishek Maiti, Sansiddh Jain, Aman Dalmia, Namrata Deka, Jerome White, Jigar Doshi, Angjoo Kanazawa, Rahul Panicker, Alpan Raval, Srinivas Rana, Makarand Tapaswi. 332-342 [doi]
- Vision-language models for decoding provider attention during neonatal resuscitationFelipe Parodi, Jordan K. Matelsky, Alejandra Regla-Vargas, Elizabeth E. Foglia, Charis Lim, Danielle Weinberg, Konrad P. Kording, Heidi M. Herrick, Michael L. Platt. 343-353 [doi]
- Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography EstimationSam Cantrill, David Ahmedt-Aristizabal, Lars Petersson, Hanna Suominen, Mohammad Ali Armin. 354-363 [doi]
- Paediatric Pulse Rate Measurements: a Comparison of Methods using Remote PhotoplethysmographySimon Wegerif, Ivan Veleslavov, Lieke Dorine van Putten, Kate Emily Bamford, Gauri Misra, Niall Mullen. 364-370 [doi]
- DECNet: A Non-Contacting Dual-Modality Emotion Classification Network for Driver Health MonitoringZhekang Dong, Chenhao Hu, Shiqi Zhou, Liyan Zhu, Junfan Wang, Yi Chen, Xudong Lv, Xiaoyue Ji. 371-379 [doi]
- Refining Remote Photoplethysmography Architectures using CKA and Empirical MethodsNathan Vance, Patrick J. Flynn. 380-388 [doi]
- Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral FeaturesAlexander Vedernikov, Zhaodong Sun, Virpi-Liisa Kykyri, Mikko Pohjola, Miriam Nokia, Xiaobai Li. 389-399 [doi]
- Video Based Computational Coding of Movement Anomalies in ASD ChildrenPriya Singh, Abhishek Pathak, Umer Jon Ganai, Braj Bhushan, Venkatesh K. Subramanian. 400-409 [doi]
- How Suboptimal is Training rPPG Models with Videos and Targets from Different Body Sites?Björn Braun, Daniel McDuff, Christian Holz 0001. 410-418 [doi]
- UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood MappingJie Zhao, Zhitong Xiong, Xiaoxiang Zhu 0001. 419-429 [doi]
- Exploring Robust Features for Few-Shot Object Detection in Satellite ImageryXavier Bou, Gabriele Facciolo, Rafael Grompone von Gioi, Jean-Michel Morel, Thibaud Ehret. 430-439 [doi]
- Efficient local correlation volume for unsupervised optical flow estimation on small moving objects in large satellite imagesSarra Khairi, Etienne Meunier, Renaud Fraisse, Patrick Bouthemy. 440-448 [doi]
- Deep Generative Data Assimilation in Multimodal SettingYongquan Qu, Juan Nathaniel, Shuolin Li, Pierre Gentine. 449-459 [doi]
- GeoSynth: Contextually-Aware High-Resolution Satellite Image SynthesisSrikumar Sastry, Subash Khanal, Aayush Dhakal, Nathan Jacobs. 460-470 [doi]
- Implicit Assimilation of Sparse In Situ Data for Dense & Global Storm Surge ForecastingPatrick Ebel 0002, Brandon Victor, Peter Naylor, Gabriele Meoni, Federico Serva, Rochelle Schneider. 471-480 [doi]
- Detecting Out-Of-Distribution Earth Observation Images with Diffusion ModelsGeorges Le Bellier, Nicolas Audebert. 481-491 [doi]
- (Street) Lights Will Guide You: Georeferencing Nighttime Astronaut Photography of EarthAlex Stoken, Peter Ilhardt, Mark Lambert, Kenton Fisher. 492-501 [doi]
- Cross-sensor super-resolution of irregularly sampled Sentinel-2 time seriesAimi Okabayashi, Nicolas Audebert, Simon Donike, Charlotte Pelletier. 502-511 [doi]
- SyntStereo2Real: Edge-Aware GAN for Remote Sensing Image-to-Image Translation while Maintaining Stereo ConstraintVasudha Venkatesan, Daniel Panangian, Mario Fuentes Reyes, Ksenia Bittner. 512-521 [doi]
- SUNDIAL: 3D Satellite Understanding through Direct, Ambient, and Complex Lighting DecompositionNikhil Behari, Akshat Dave, Kushagra Tiwary, William Yang, Ramesh Raskar. 522-532 [doi]
- Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite ImagesAayush Dhakal, Adeel Ahmad, Subash Khanal, Srikumar Sastry, Hannah Kerner, Nathan Jacobs. 533-542 [doi]
- Unsupervised Domain Adaptation Architecture Search with Self-Training for Land Cover MappingClifford Broni-Bediako, Junshi Xia, Naoto Yokoya. 543-553 [doi]
- Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMsJonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han 0001, Samuel Albanie. 554-563 [doi]
- Radar Fields: An Extension of Radiance Fields to SARThibaud Ehret, Roger Marí, Dawa Derksen, Nicolas Gasnier, Gabriele Facciolo. 564-574 [doi]
- Contrastive Pretraining for Visual Concept Explanations of Socioeconomic OutcomesIvica Obadic, Alex Levering, Lars Pennig, Dário A. B. Oliveira, Diego Marcos, Xiaoxiang Zhu 0001. 575-584 [doi]
- GeoLLM-Engine: A Realistic Environment for Building Geospatial CopilotsSimranjit Singh 0003, Michael Fore, Dimitrios Stamoulis. 585-594 [doi]
- Let me show you how it's done - Cross-modal knowledge distillation as pretext task for semantic segmentationRudhishna Narayanan Nair, Ronny Hänsch. 595-603 [doi]
- Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze EstimationSwati Jindal, Mohit Yadav, Roberto Manduchi. 604-614 [doi]
- Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze FollowingAnshul Gupta, Pierre Vuillecard, Arya Farkhondeh, Jean-Marc Odobez. 615-624 [doi]
- Gaze Scanpath Transformer: Predicting Visual Search Target by Spatiotemporal Semantic Modeling of Gaze ScanpathTakumi Nishiyasu, Yoichi Sato. 625-635 [doi]
- GESCAM : A Dataset and Method on Gaze Estimation for Classroom Attention MeasurementAthul M. Mathew, Arshad Ali Khan, Thariq Khalid, Riad Souissi. 636-645 [doi]
- Semi-Stereo: A Universal Stereo Matching Framework for Imperfect Data via Semi-supervised LearningXin Yue, Zongqing Lu, Xiangru Lin, Wenjia Ren, Zhijing Shao, Haonan Hu, Yu Zhang 0166, Qingmin Liao. 646-655 [doi]
- MonoSelfRecon: Purely Self-Supervised Explicit Generalizable 3D Reconstruction of Indoor Scenes from Monocular RGB ViewsRunfa Li, Upal Mahbub, Vasudev Bhaskaran, Truong Q. Nguyen. 656-666 [doi]
- Lifting Multi-View Detection and Tracking to the Bird's Eye ViewTorben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, Gerhard Rigoll. 667-676 [doi]
- 3D Clothed Human Reconstruction from Sparse Multi-View ImagesJin Gyu Hong, Seung Young Noh, Hee-Kyung Lee, Won-Sik Cheong, Ju Yong Chang. 677-687 [doi]
- SACReg: Scene-Agnostic Coordinate Regression for Visual LocalizationJérôme Revaud, Yohann Cabon, Romain Brégier, Jongmin Lee, Philippe Weinzaepfel. 688-698 [doi]
- DepthVoting: A Few-Shot Point Cloud Classification Model Incorporating a Projection-Based Voting MechanismYunhui Zhu, Jiajing Chen, Senem Velipasalar. 699-707 [doi]
- Cross-Modal Self-Training: Aligning Images and Pointclouds to learn Classification without LabelsAmaya Dharmasiri, Muzammal Naseer, Salman Khan 0001, Fahad Shahbaz Khan. 708-717 [doi]
- MIMIC: Masked Image Modeling with Image CorrespondencesKalyani Marathe, Mahtab Bigverdi, Nishat Khan, Tuhin Kundu, Patrick Howe, Sharan Ranjit S, Anand Bhattad, Aniruddha Kembhavi, Linda G. Shapiro, Ranjay Krishna. 718-727 [doi]
- Selective Multi-View Deep Model for 3D Object ClassificationMona Saleh Alzahrani, Muhammad Usman, Saeed Anwar, Tarek Helmy. 728-736 [doi]
- From 2D Portraits to 3D Realities: Advancing GAN Inversion for Enhanced Image SynthesisWonseok Oh, Youngjoo Jo. 737-746 [doi]
- DGBD: Depth Guided Branched Diffusion for Comprehensive Controllability in Multi-View GenerationHovhannes Margaryan, Daniil Hayrapetyan, Wenyan Cong, Zhangyang Wang, Humphrey Shi. 747-756 [doi]
- 2T-UNET: A Two-Tower UNet with Depth Clues for Robust Stereo Depth EstimationMansi Sharma, Rohit Choudhary, Rithvik Anil. 757-764 [doi]
- AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer LearningGuoxian Song. 765-774 [doi]
- Color-cued Efficient Densification Method for 3D Gaussian SplattingSieun Kim, Kyungjin Lee, Youngki Lee. 775-783 [doi]
- PointOfView: A Multi-modal Network for Few-shot 3D Point Cloud Classification Fusing Point and Multi-view Image FeaturesHuantao Ren, Jiyang Wang, Minmin Yang, Senem Velipasalar. 784-793 [doi]
- OGRMPI: An Efficient Multiview Integrated Multiplane Image based on Occlusion Guided ResidualsDae Yeol Lee, Guan-Ming Su, Peng Yin 0002. 794-802 [doi]
- Sparse multi-view hand-object reconstruction for unseen environmentsYik Lung Pang, Changjae Oh, Andrea Cavallaro. 803-810 [doi]
- Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot ImagesJaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee. 811-820 [doi]
- LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic InsightsThibault Castells, Hyoung-Kyu Song, Bo-kyeong Kim, Shinkook Choi. 821-830 [doi]
- EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait RelightingMin-Hui Lin 0003, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi. 831-840 [doi]
- Camera Motion Estimation from RGB-D-Inertial Scene FlowSamuel Cerezo, Javier Civera 0001. 841-849 [doi]
- BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics PrimitivesSainan Liu, Shan Lin, Jingpei Lu, Alexey Supikov, Michael C. Yip. 850-857 [doi]
- Weakly Supervised End2End Deep Visual OdometryAmin Abouee, Ashwanth Ravi, Lars Hinneburg, Mateusz Dziwulski, Florian Ölsner, Jürgen Hess 0005, Stefan Milz, Patrick Mäder. 858-865 [doi]
- Connecting NeRFs, Images, and TextFrancesco Ballerini, Pierluigi Zama Ramirez, Roberto Mirabella, Samuele Salti, Luigi di Stefano. 866-876 [doi]
- Contextualising Implicit Representations for Semantic TasksTheo W. Costain, Kejie Li, Victor Adrian Prisacariu. 877-887 [doi]
- StegaNeRV: Video Steganography using Implicit Neural RepresentationMonsij Biswal, Tong Shao, Kenneth Rose, Peng Yin 0002, Sean McCarthy. 888-898 [doi]
- ImplicitTerrain: a Continuous Surface Model for Terrain Data AnalysisHaoan Feng, Xin Xu, Leila De Floriani. 899-909 [doi]
- Reference-based GAN Evaluation by Adaptive InversionJianbo Wang, Heliang Zheng, Toshihiko Yamasaki. 910-918 [doi]
- Unified Physical-Digital Attack Detection ChallengeHaocheng Yuan, Ajian Liu, Junze Zheng, Jun Wan 0001, Jiankang deng, Sergio Escalera, Hugo Jair Escalante, Isabelle Guyon, Zhen Lei 0001. 919-929 [doi]
- Multi-angle Consistent Generative NeRF with Additive Angular Margin Momentum Contrastive LearningHang Zou, Hui Zhang, Yuan Zhang, Hui Ma, Dexin Zhao, Qi Zhang, Qi Li. 930-939 [doi]
- Rethinking the Domain Gap in Near-infrared Face RecognitionMichail Tarasiou, Jiankang deng, Stefanos Zafeiriou. 940-949 [doi]
- IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image ModelsSiying Cui, Jia Guo, Xiang An, Jiankang deng, Yongle Zhao, Xinyu Wei, Ziyong Feng. 950-959 [doi]
- Unified Face Attack Detection with Micro Disturbance and a Two-Stage Training StrategyJiaruo Yu, Dagong Lu, Xingyue Shi, Chenfan Qu, Fengjun Guo. 960-969 [doi]
- Advancing Cross-Domain Generalizability in Face Anti-Spoofing: Insights, Design, and MetricsHyoJin Kim, Jiyoon Lee, Yonghyun Jeong, Haneol Jang, Youngjoon Yoo. 970-979 [doi]
- Supervised Contrastive Learning for Snapshot Spectral Imaging Face Anti-SpoofingChuanbiao Song, Yan Hong, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang 0003. 980-985 [doi]
- A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasksMinzhe Huang, Changwei Nie, Weihong Zhong. 986-994 [doi]
- Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing CluesXianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu. 995-1004 [doi]
- Snapshot Spectral Imaging for Face Anti-Spoofing: Addressing Data Challenges with Advanced Processing and TrainingHui Li, Yaowen Xu, Zhaofan Zou, Zhixiang He. 1005-1012 [doi]
- Multiattention-Net: A Novel Approach to Face Anti-Spoofing with Modified Squeezed Residual BlocksSabari Nathan, M. Parisa Beham, A Nagaraj, S. Mohamed Mansoor Roomi. 1013-1020 [doi]
- Assessing the Performance of Efficient Face Anti-Spoofing Detection Against Physical and Digital Presentation AttacksLuis S. Luevano, Yoanna Martínez-Díaz, Heydi Méndez-Vázquez, Miguel González-Mendoza 0001, Davide Frey. 1021-1028 [doi]
- MixStyle-Based Contrastive Test-Time Adaptation: Pathway to Domain GeneralizationKota Yamashita, Kazuhiro Hotta. 1029-1037 [doi]
- Fully Test-time Adaptation for Object DetectionXiaoqian Ruan, Wei Tang. 1038-1047 [doi]
- Test-time Specialization of Dynamic Neural NetworksSam Leroux, Dewant Katare, Aaron Yi Ding, Pieter Simoens. 1048-1056 [doi]
- ST2ST: Self-Supervised Test-time Adaptation for Video Action RecognitionMasud An Nur Islam Fahim, Mohammed Innat, Jani Boutellier. 1057-1066 [doi]
- Unknown Sample Discovery for Source Free Open Set Domain AdaptationChowdhury Sadman Jahan, Andreas E. Savakis. 1067-1076 [doi]
- UDAC: Under-Display Array CamerasChengyu Wang 0011, Jing Li, Pavan C. Madhusudanarao, Jinhan Hu, Jitesh K. Singh, WooJhon Choi, Seok-Jun Lee, Hamid R. Sheikh. 1077-1084 [doi]
- 2NM: Extremely Low-light Noise Modeling Through Diffusion IterationJiahao Qin, Pinle Qin, Rui Chai, Jia Qin, Zanxia Jin. 1085-1094 [doi]
- Event Camera Demosaicing via Swin Transformer and Pixel-focus LossYunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong 0001. 1095-1105 [doi]
- From Synthetic to Real: A Calibration-free Pipeline for Few-shot Raw Image DenoisingRuoqi Li, Chang Liu 0030, Ziyi Wang 0006, Yao Du, Jingjing Yang, Long Bao, Heng Sun. 1106-1114 [doi]
- LaDiffGAN: Training GANs with Diffusion Supervision in Latent SpacesXuhui Liu, Bohan Zeng, Sicheng Gao, Shanglin Li, Yutang Feng, Hong Li, Boyu Liu, Jianzhuang Liu, Baochang Zhang 0001. 1115-1125 [doi]
- DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS CameraSenyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha. 1126-1135 [doi]
- MIPI 2024 Challenge on Demosaic for Hybridevs Camera: Methods and ResultsYaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li 0002, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li 0009, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai-Feng, Yongyong Chen, Jingyong Su, Xianyu Guan, Hongyuan Yu, Cheng Wan 0006, Jiamin Lin, Binnan Han, Yajun Zou, Zhuoyuan Wu, Yuan Huang, Yongsheng Yu, Daoan Zhang, JiZhe Li, Xuanwu Yin, Kunlong Zuo, Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong 0001, Wei Yu, Bingchun Luo, Sabari Nathan, Priya Kansal. 1136-1143 [doi]
- MIPI 2024 Challenge on Nighttime Flare Removal: Methods and ResultsYuekun Dai, Dafeng Zhang, Xiaoming Li 0002, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu 0005, Chen Change Loy. 1144-1152 [doi]
- MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and ResultsXin Jin 0005, Chunle Guo, Xiaoming Li 0002, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu 0030, Ziyi Wang 0006, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan, Wenhan Luo, Zikun Liu, Mingde Qiao, Junjun Jiang, Kui Jiang, Yao Xiao, Chuyang Sun, Jinhui Hu, Weijian Ruan, Yubo Dong, Kai Chen 0026, Hyejeong Jo, Jiahao Qin, Bingjie Han, Pinle Qin, Rui Chai, Pengyuan Wang. 1153-1161 [doi]
- How to Benchmark Vision Foundation Models for Semantic Segmentation?Tommie Kerssies, Daan de Geus, Gijs Dubbelman. 1162-1171 [doi]
- Exploring the Benefits of Vision Foundation Models for Unsupervised Domain AdaptationBrunó Bence Englert, Fabrizio J. Piva, Tommie Kerssies, Daan de Geus, Gijs Dubbelman. 1172-1180 [doi]
- Towards Learning Image Similarity from General Triplet LabelsRadu Dondera. 1181-1190 [doi]
- Coarse or Fine? Recognising Action End States without LabelsDavide Moltisanti, Hakan Bilen, Laura Sevilla-Lara, Frank Keller. 1191-1200 [doi]
- Leveraging Large Language Models for Multimodal SearchOriol Barbany, Michael Huang, Xinliang Zhu, Arnab Dhua. 1201-1210 [doi]
- ConceptHash: Interpretable Fine-Grained Hashing via Concept DiscoveryKam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang 0002. 1211-1223 [doi]
- Making use of unlabeled data: Comparing strategies for marine animal detection in long-tailed datasets using self-supervised and semi-supervised pre-trainingTarun Sharma, Danelle E. Cline, Duane Edgington. 1224-1233 [doi]
- HyperLeaf2024 - A Hyperspectral Imaging Dataset for Classification and Regression of Wheat LeavesWilliam Michael Laprade, Pawel Pieta, Svetlana Kutuzova, Jesper Cairo Westergaard, Mads Nielsen, Svend Christensen, Anders Bjorholm Dahl. 1234-1243 [doi]
- Monitoring Social Insect Activity with Minimal Human SupervisionTarun Sharma, Julian Morgan Wagner, Sara Beery, William B. Dickson, Michael H. Dickinson, Joseph Parker. 1244-1253 [doi]
- Sensor Equivariance: A Framework for Semantic Segmentation with Diverse Camera ModelsHannes Reichert, Manuel Hetzel, Andreas Hubert, Konrad Doll, Bernhard Sick. 1254-1261 [doi]
- Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical RepresentationsJingguo Liu, Yijun Xu, Shigang Li 0001, Jianfeng Li 0003. 1262-1271 [doi]
- BGDNet: Background-guided Indoor Panorama Depth EstimationJiajing Chen, Zhiqiang Wan, Manjunath Narayana, Yuguang Li, Will Hutchcroft, Senem Velipasalar, Sing Bing Kang. 1272-1281 [doi]
- DQ-HorizonNet: Enhancing Door Detection Accuracy in Panoramic Images via Dynamic QuantizationCing-Jia Lin, Jheng-Wei Su, Kai-Wen Hsiao, Ting-Yu Yen, Chih-Yuan Yao, Hung-Kuo Chu. 1282-1289 [doi]
- Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene UnderstandingJay Bhanushali, Manivannan Muniyandi, Praneeth Chakravarthula. 1290-1300 [doi]
- Impact of Video Compression Artifacts on Fisheye Camera Visual Perception TasksMadhumitha Sakthi, Louis Kerofsky, Varun Ravi Kumar, Senthil Kumar Yogamani. 1301-1310 [doi]
- MultiPanoWise: holistic deep architecture for multi-task dense prediction from a single panoramic imageUzair Shah, Muhammad Tukur, Mahmood Alzubaidi, Giovanni Pintore, Enrico Gobbetti, Mowafa S. Househ, Jens Schneider 0002, Marco Agus. 1311-1321 [doi]
- Multi-scale Attention-Based Inclination Angles Estimation for Panoramic CameraYuhao Shan, Heyu Chen, Jiaying Zhang, Shigang Li 0001, Jianfeng Li 0003. 1322-1330 [doi]
- FisheyeBEVSeg: Surround View Fisheye Cameras based Bird's-Eye View Segmentation for Autonomous DrivingSenthil Kumar Yogamani, David Unger, Venkatraman Narayanan, Varun Ravi Kumar. 1331-1334 [doi]
- Exploring the Limits: Applying State-of-the-Art Stereo Matching Algorithms to Rectified Ultra-Wide StereoFilip Slezak, Morten Stigaard Laursen, Thomas B. Moeslund. 1335-1344 [doi]
- Gain-first or Exposure-first: Benchmark for Better Low-light Video Photography and EnhancementHaiyang Jiang 0002, Zhihang Zhong, Yinqiang Zheng. 1345-1356 [doi]
- Point-Supervised Semantic Segmentation of Natural Scenes via Hyperspectral ImagingTianqi Ren, Qiu Shen, Ying Fu 0001, Shaodi You. 1357-1367 [doi]
- Computational Spectral Imaging with Unified Encoding Model and BeyondXinyuan Liu, Lingen Li, Lin Zhu, Lizhi Wang. 1368-1378 [doi]
- ViTKD: Feature-based Knowledge Distillation for Vision TransformersZhendong Yang, Zhe Li, Ailing Zeng, Zexian Li, Chun Yuan, Yu Li 0003. 1379-1388 [doi]
- Generalized Foggy-Scene Semantic Segmentation by Frequency DecouplingQi Bi, Shaodi You, Theo Gevers. 1389-1399 [doi]
- Generating Material-Aware 3D Models from Sparse ViewsShi Mao, Chenming Wu, Ran Yi, Zhelun Shen, Liangjun Zhang, Wolfgang Heidrich. 1400-1409 [doi]
- Physics Based Camera Privacy: Lens and Network Co-Design to the RescueMarius Dufraisse, Marcela Carvalho, Pauline Trouvé-Peloux, Frédéric Champagnat. 1410-1419 [doi]
- Imaging Signal Recovery Using Neural Network Priors Under Uncertain Forward Model ParametersXiwen Chen, Wenhui Zhu, Peijie Qiu, Abolfazl Razi. 1420-1429 [doi]
- GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT PlanningJiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu 0001, Yafei Wen, Xiaoxin Chen, Shifeng Chen. 1430-1440 [doi]
- 3D Kinematics Estimation from Video with a Biomechanical Model and Synthetic Training DataZhi-Yi Lin, Bofan Lyu, Judith Cueto Fernandez, Eline van der Kruk, Ajay Seth, Xucong Zhang. 1441-1450 [doi]
- Outsmarting Biometric Imposters: Enhancing Iris-Recognition System Security through Physical Adversarial Example Generation and PAD Fine-TuningYuka Ogino, Kazuya Kakizaki, Takahiro Toizumi, Atsushi Ito. 1451-1461 [doi]
- FIQA-FAS: Face Image Quality Assessment Based Face Anti-SpoofingYa-Chi Liang, Min-Xuan Qiu, Shang-Hong Lai. 1462-1470 [doi]
- Adversarial Identity Injection for Semantic Face Image SynthesisGiuseppe Tarollo, Tomaso Fontanini, Claudio Ferrari, Guido Borghi, Andrea Prati 0001. 1471-1480 [doi]
- Confidence-Aware RGB-D Face Recognition via Virtual Depth SynthesisZijian Chen, Mei Wang, Weihong Deng, Hongzhi Shi, Dongchao Wen, Yingjie Zhang, Xingchen Cui, Jian Zhao. 1481-1489 [doi]
- GraFIQs: Face Image Quality Assessment Using Gradient MagnitudesJan Niklas Kolf, Naser Damer, Fadi Boutros. 1490-1499 [doi]
- One Embedding to Predict Them All: Visible and Thermal Universal Face Representations for Soft Biometric Estimation via Vision TransformersNélida Mirabet Herranz, Chiara Galdi, Jean-Luc Dugelay. 1500-1509 [doi]
- Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision TransformerHaoyu Zhang, Raghavendra Ramachandra, Kiran B. Raja, Christoph Busch 0001. 1510-1518 [doi]
- Can the accuracy bias by facial hairstyle be reduced through balancing the training data?Kagan Öztürk, Haiyu Wu, Kevin W. Bowyer. 1519-1528 [doi]
- TattTRN: Template Reconstruction Network for Tattoo RetrievalLázaro Janier González-Soler, Maciej Salwowski, Christian Rathgeb, Daniel Fischer. 1529-1538 [doi]
- What Makes Multimodal In-Context Learning Work?Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. 1539-1550 [doi]
- Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNetsHao Chen 0102, Ran Tao 0013, Han Zhang 0048, Yidong Wang, Xiang Li 0106, Wei Ye 0004, Jindong Wang 0001, Guosheng Hu, Marios Savvides. 1551-1561 [doi]
- Enhancing Visual Question Answering through Question-Driven Image Captions as PromptsÖvgü Özdemir, Erdem Akagündüz. 1562-1571 [doi]
- AAPL: Adding Attributes to Prompt Learning for Vision-Language ModelsGahyeon Kim, Sohee Kim, SeokJu Lee. 1572-1582 [doi]
- Prompting Foundational Models for Omni-supervised Instance SegmentationArnav M. Das, Ritwick Chaudhry, Kaustav Kundu, Davide Modolo. 1583-1592 [doi]
- Low-Rank Few-Shot Adaptation of Vision-Language ModelsMaxime Zanella, Ismail Ben Ayed. 1593-1603 [doi]
- PointPrompt: A Multi-modal Prompting Dataset for Segment Anything ModelJorge Quesada, Mohammad AlOtaibi, Mohit Prabhushankar, Ghassan Alregib. 1604-1610 [doi]
- Uncovering the Hidden Cost of Model CompressionDiganta Misra, Muawiz Chaudhary, Agam Goyal, Bharat Runwal, Pin-Yu Chen. 1611-1621 [doi]
- MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D KeypointsBedirhan Uguz, Ozhan Suat, Batuhan Karagöz, Emre Akbas. 1622-1632 [doi]
- V-VIPE: Variational View Invariant Pose EmbeddingMara Levy, Abhinav Shrivastava. 1633-1642 [doi]
- A Survey on 3D Egocentric Human Pose EstimationMd Mushfiqur Azam, Kevin Desai. 1643-1654 [doi]
- CycleGANAS: Differentiable Neural Architecture Search for CycleGANTaegun An, Changhee Joo. 1655-1664 [doi]
- The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching ProtocolKonstanty Subbotko, Wojciech Jablonski, Piotr Bilinski. 1665-1674 [doi]
- UP-NAS: Unified Proxy for Neural Architecture SearchYi-Cheng Huang, Wei-Hua Li, Chih-Han Tsou, Jun-Cheng Chen, Chu-Song Chen. 1675-1684 [doi]
- CSCO: Connectivity Search of Convolutional OperatorsTunhou Zhang, Shiyu Li, Hsin-Pai Cheng, Feng Yan 0001, Hai Li 0001, Yiran Chen 0001. 1685-1694 [doi]
- GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution ShiftsSofia Casarin, Oswald Lanz, Sergio Escalera. 1695-1703 [doi]
- QuantNAS: Quantization-aware Neural Architecture Search For Efficient Deployment On Mobile DeviceTianxiao Gao, Li Guo 0006, Shanwei Zhao, Peihan Xu, Yukun Yang, Xionghao Liu, Shihao Wang, Shiai Zhu, Dajiang Zhou. 1704-1713 [doi]
- Strategies to Leverage Foundational Model Knowledge in Object Affordance GroundingArushi Rai, Kyle Buettner, Adriana Kovashka. 1714-1723 [doi]
- Recognize Anything: A Strong Image Tagging ModelYoucai Zhang, Xinyu Huang, Jinyu Ma, Zhaoyang Li, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang. 1724-1732 [doi]
- ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval ModelsAvinash Madasu, Vasudev Lal. 1733-1743 [doi]
- Continual Diffusion with STAMINA: STack-And-Mask INcremental AdaptersJames Seale Smith, Yen-Chang Hsu, Zsolt Kira, Yilin Shen, Hongxia Jin. 1744-1754 [doi]
- Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion ModelsGong Zhang 0011, Kai Wang 0058, Xingqian Xu, Zhangyang Wang, Humphrey Shi. 1755-1764 [doi]
- LLM-Seg: Bridging Image Segmentation and Large Language Model ReasoningJunchi Wang, Lei Ke. 1765-1774 [doi]
- Matting AnythingJiachen Li 0003, Jitesh Jain, Humphrey Shi. 1775-1785 [doi]
- Robustness Analysis on Foundational Segmentation ModelsMadeline Chantry Schiappa, Shehreen Azad, Sachidanand VS, Yunhao Ge, Ondrej Miksik, Yogesh S. Rawat, Vibhav Vineet. 1786-1796 [doi]
- Probing Conceptual Understanding of Large Visual-Language ModelsMadeline Schiappa, Raiyaan Abdullah, Shehreen Azad, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh S. Rawat. 1797-1807 [doi]
- Show, Think, and Tell: Thought-Augmented Fine-Tuning of Large Language Models for Video CaptioningByoungjip Kim, Dasol Hwang, Sungjun Cho, Youngsoo Jang, Honglak Lee, Moontae Lee. 1808-1817 [doi]
- Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMsDavide Caffagni, Federico Cocchi, Nicholas Moratelli, Sara Sarto, Marcella Cornia, Lorenzo Baraldi 0002, Rita Cucchiara. 1818-1826 [doi]
- Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and SpecificityZhenlin Xu, Yi Zhu, Siqi Deng, Abhay Mittal, Yanbei Chen, Manchen Wang, Paolo Favaro, Joseph Tighe, Davide Modolo. 1827-1836 [doi]
- Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal AdaptationKai Wang 0036, Yapeng Tian, Dimitrios Hatzinakos. 1837-1846 [doi]
- ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language ModelsMengxue Qu, Xiaodong Chen, Wu Liu, Alicia Li, Yao Zhao 0001. 1847-1856 [doi]
- SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal AttentionMuhammad Nawfal Meeran, Gokul Adethya T, Bhanu Pratyush Mantha. 1857-1866 [doi]
- T2LM: Long-Term 3D Human Motion Generation from Multiple SentencesTaeryung Lee, Fabien Baradel, Thomas Lucas 0002, Kyoung Mu Lee, Grégory Rogez. 1867-1876 [doi]
- Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable InputsUttaran Bhattacharya, Aniket Bera, Dinesh Manocha. 1877-1887 [doi]
- Exploring Text-to-Motion Generation with Human PreferenceJenny Sheng, Matthieu Lin, Andrew Zhao, Kevin Pruvost, Yu-Hui Wen, Yangguang Li, Gao Huang 0001, Yong-Jin Liu. 1888-1899 [doi]
- Two-Person Interaction Augmentation with Skeleton PriorsBaiyi Li, Edmond S. L. Ho, Hubert P. H. Shum, He Wang 0002. 1900-1910 [doi]
- Multi-Track Timeline Control for Text-Driven 3D Human Motion GenerationMathis Petrovich, Or Litany, Umar Iqbal 0001, Michael J. Black, Gül Varol, Xue Bin Peng, Davis Rempe. 1911-1921 [doi]
- DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech GesturesSteven Hogue, Chenxu Zhang, Hamza Daruger, Yapeng Tian, Xiaohu Guo. 1922-1931 [doi]
- A Cross-Dataset Study for Text-based 3D Human Motion RetrievalLéore Bensabath, Mathis Petrovich, Gül Varol. 1932-1940 [doi]
- in2IN: Leveraging individual Information to Generate Human INteractionsPablo Ruiz-Ponce, Germán Barquero, Cristina Palmero, Sergio Escalera, José García Rodríguez 0001. 1941-1951 [doi]
- Fake it to make it: Using synthetic data to remedy the data shortage in joint multi-modal speech-and-gesture synthesisShivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson. 1952-1964 [doi]
- Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly DetectionAyush Ghadiya, Purbayan Kar, Vishal M. Chudasama, Pankaj Wasnik. 1965-1974 [doi]
- Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint LearningZaber Ibn Abdul Hakim, Najibul Haque Sarker, Rahul Pratap Singh, Bishmoy Paul, Ali Dabouei, Min Xu. 1975-1985 [doi]
- De-noised Vision-language Fusion Guided by Visual Cues for E-commerce Product SearchZhizhang Hu, Shasha Li 0001, Ming Du, Arnab Dhua, Douglas Gray 0001. 1986-1996 [doi]
- RGB-D Cube R-CNN: 3D Object Detection with Selective Modality DropoutJens Piekenbrinck, Alexander Hermans, Narunas Vaskevicius, Timm Linder, Bastian Leibe. 1997-2006 [doi]
- Multimodal Understanding of Memes with Fair ExplanationsYang Zhong, Bhiman Kumar Baghel. 2007-2017 [doi]
- Listen Then See: Video Alignment with Speaker AttentionAviral Agrawal, Carlos Mateo Samudio Lezcano, Iqui Balam Heredia-Marin, Prabhdeep Singh Sethi. 2018-2027 [doi]
- InVERGe: Intelligent Visual Encoder for Bridging Modalities in Report GenerationAnkan Deria, Komal Kumar, Snehashis Chakraborty, Dwarikanath Mahapatra, Sudipta Roy 0002. 2028-2038 [doi]
- LAformer: Trajectory Prediction for Autonomous Driving with Lane-Aware Scene ConstraintsMengmeng Liu, Hao Cheng 0008, Lin Chen, Hellward Broszio, Jiangtao Li, Runjiang Zhao, Monika Sester, Michael Ying Yang. 2039-2049 [doi]
- ZInD-Tell: Towards Translating Indoor Panoramas into DescriptionsTonmoay Deb, Lichen Wang, Zachary Bessinger, Naji Khosravan, Eric Penner, Sing Bing Kang. 2050-2059 [doi]
- VMCML: Video and Music Matching via Cross-Modality LiftingYi-Shan Lee, Wei-Cheng Tseng, Fu-En Wang, Min Sun 0001. 2060-2069 [doi]
- AIGeN: An Adversarial Approach for Instruction Generation in VLNNiyati Rawal, Roberto Bigazzi, Lorenzo Baraldi 0002, Rita Cucchiara. 2070-2080 [doi]
- Multi-Modal Fusion of Event and RGB for Monocular Depth Estimation Using a Unified Transformer-based ArchitectureAnusha Devulapally, Md Fahim Faysal Khan, Siddharth Advani, Vijaykrishnan Narayanan. 2081-2089 [doi]
- Exploring the Role of Audio in Video CaptioningYuhan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang. 2090-2100 [doi]
- Dedicated Inference Engine and Binary-Weight Neural Networks for Lightweight Instance SegmentationTse-Wei Chen 0001, Wei Tao 0001, Dongyue Zhao, Kazuhiro Mima, Tadayuki Ito, Kinya Osa, Masami Kato. 2101-2110 [doi]
- Lightweight Maize Disease Detection through Post-Training Quantization with Similarity PreservationCarlos Victorino Padeiro, Tse-Wei Chen 0004, Takahiro Komamizu, Ichiro Ide. 2111-2120 [doi]
- Multi-bit, Black-box Watermarking of Deep Neural Networks in Embedded ApplicationsSam Leroux, Stijn Vanassche, Pieter Simoens. 2121-2130 [doi]
- Pruning as a Binarization TechniqueLukas Frickenstein, Pierpaolo Morì, Shambhavi Balamuthu Sampath, Moritz Thoma, Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, Christian Unger, Claudio Passerone, Walter Stechele. 2131-2140 [doi]
- Neuromorphic Lip-Reading with Signed Spiking Gated Recurrent UnitsManon Dampfhoffer, Thomas Mesquida. 2141-2151 [doi]
- Efficient Video Stabilization via Partial Block Phase Correlation on Edge GPUsCevahir Çigla. 2152-2161 [doi]
- SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning IterationsJamie Menjay Lin, Jisoo Jeong, Hong Cai, Risheek Garrepalli, Kai Wang, Fatih Porikli. 2162-2171 [doi]
- Structured Sparse Back-propagation for Lightweight On-Device Continual Learning on Microcontroller UnitsFrancesco Paissan, Davide Nadalini, Manuele Rusci, Alberto Ancilotto, Francesco Conti 0001, Luca Benini, Elisabetta Farella. 2172-2181 [doi]
- Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded SystemsLuca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti 0001, Luca Benini. 2182-2190 [doi]
- ED-DCFNet: an unsupervised encoder-decoder neural model for event-driven feature extraction and object trackingRaz Ramon, Hadar Cohen-Duwek, Elishai Ezra Tsur. 2191-2199 [doi]
- RAVN: Reinforcement Aided Adaptive Vector Quantization of Deep Neural NetworksAnamika Jha, Aratrik Chattopadhyay, Mrinal Banerji, Disha Jain. 2200-2209 [doi]
- Prune Efficiently by Soft PruningParakh Agarwal, Manu Mathew, Kunal Ranjan Patel, Varun Tripathi, Pramod Swami. 2210-2217 [doi]
- Content-aware Input Scaling and Deep Learning Computation Offloading for Low-Latency Embedded VisionOmkar Prabhune, Tianen Chen, Younghyun Kim 0001. 2218-2226 [doi]
- Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math TermsTristan Maidment, Purav J. Patel, Erin Walker, Adriana Kovashka. 2227-2237 [doi]
- What does CLIP know about peeling a banana?Claudia Cuttano, Gabriele Rosi, Gabriele Trivigno, Giuseppe Averta. 2238-2247 [doi]
- Task Navigator: Decomposing Complex Tasks for Multimodal Large Language ModelsFeipeng Ma, Yizhou Zhou, Yueyi Zhang, Siying Wu, Zheyu Zhang 0002, Zilong He, Fengyun Rao, Xiaoyan Sun 0001. 2248-2257 [doi]
- Multi-Explainable TemporalNet: An Interpretable Multimodal Approach using Temporal Convolutional Network for User-level Depression DetectionAnas Zafar, Danyal Aftab, Rizwan Qureshi, Yaofeng Wang, Hong Yan 0001. 2258-2265 [doi]
- ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis SystemMd. Adnan Arefeen, Biplob Debnath, Md. Yusuf Sarwar Uddin, Srimat Chakradhar. 2266-2274 [doi]
- Strategies to Improve Real-World Applicability of Laparoscopic Anatomy Segmentation ModelsFiona R. Kolbinger, Jiangpeng He, Jinge Ma, Fengqing Zhu 0001. 2275-2284 [doi]
- nnMobileNet: Rethinking CNN for Retinopathy ResearchWenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Leporé, Oana M. Dumitrascu, Yalin Wang 0001. 2285-2294 [doi]
- Distribution-Aware Multi-Label FixMatch for Semi-Supervised Learning on CheXpertSontje Ihler, Felix Kuhnke, Timo Kuhlgatz, Thomas Seel. 2295-2304 [doi]
- Repurposing the Image Generative Potential: Exploiting GANs to Grade Diabetic RetinopathyIsabella Poles, Eleonora D'Arnese, Luca G. Cellamare, Marco D. Santambrogio, Darvin Yi. 2305-2314 [doi]
- Repeat and Concatenate: 2D to 3D Image Translation with 3D to 3D Generative ModelingAbril Corona-Figueroa, Hubert P. H. Shum, Chris G. Willcocks. 2315-2324 [doi]
- ControlPolypNet: Towards Controlled Colon Polyp Synthesis for Improved Polyp SegmentationVanshali Sharma, Abhishek Kumar, Debesh Jha, Manas Kamal Bhuyan, Pradip K. Das, Ulas Bagci. 2325-2334 [doi]
- Generation of Structurally Realistic Retinal Fundus Images with Diffusion ModelsSojung Go, Younghoon Ji, Sang-Jun Park, Soochahn Lee. 2335-2344 [doi]
- A Comparative Analysis of Implicit Augmentation Techniques for Breast Cancer Diagnosis Using Multiple ViewsYumnah Hasan, Talhat Khan, Darian Reyes Fernández de Bulnes, Juan F. H. Albarracín, Conor Ryan. 2345-2354 [doi]
- Creating a Digital Twin of Spinal Surgery: A Proof of ConceptJonas Hein, Frédéric Giraud, Lilian Calvet, Alexander Schwarz, Nicola Alessandro Cavalcanti, Sergey Prokudin, Mazda Farshad, Siyu Tang 0001, Marc Pollefeys, Fabio Carrillo, Philipp Fürnstahl. 2355-2364 [doi]
- Codebook VQ-VAE Approach for Prostate Cancer Diagnosis using Multiparametric MRIEkaterina Redekop, Mara Pleasure, Zichen Wang, Karthik V. Sarma, Adam Kinnaird, William Speier, Corey W. Arnold. 2365-2372 [doi]
- Advancing Brain Tumor Analysis: Curating a High-Quality MRI Dataset for Deep Learning-Based Molecular Marker ProfilingDivya D. Reddy, Niloufar Saadat, James M. Holcomb, Benjamin C. Wagner, Nghi C. Truong, Jason Bowerman, Kimmo J. Hatanpaa, Toral R. Patel, Marco C. Pinho, Ananth J. Madhuranthakam, Chandan Ganesh Bangalore Yogananda, Joseph A. Maldjian. 2373-2379 [doi]
- Privacy-Preserving Collaboration for Multi-Organ Segmentation via Federated Learning from Sites with Partial LabelsAdway U. Kanhere, Pranav Kulkarni, Paul H. Yi, Vishwa S. Parekh. 2380-2387 [doi]
- GSAM+Cutie: Text-Promptable Tool Mask Annotation for Endoscopic VideoRoger D. Soberanis-Mukul, Jiahuan Cheng, Jan Emily Mangulabnan, S. Swaroop Vedula, Masaru Ishii, Gregory D. Hager, Russell H. Taylor, Mathias Unberath. 2388-2394 [doi]
- MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal SystemsTiago Mota, Maria Rita Verdelho, Diogo J. Araújo, Alceu Bissoto, Carlos Santiago, Catarina Barata. 2395-2403 [doi]
- Hairy Ground Truth Enhancement for Semantic SegmentationSophie Fischer, Irina Voiculescu. 2404-2412 [doi]
- Beyond respiratory models: a physics-enhanced synthetic data generation method for 2D-3D deformable registrationFrançois Lecomte, Pablo Alvarez 0001, Stéphane Cotin, Jean-Louis Dillenseger. 2413-2421 [doi]
- UltraAugment: Fan-shape and Artifact-based Data Augmentation for 2D Ultrasound ImagesFlorian Ramakers, Tom Vercauteren, Jan Deprest, Helena Williams. 2422-2431 [doi]
- PARASOL: Parametric Style Control for Diffusion Image SynthesisGemma Canet Tarres, Dan Ruta, Tu Bui, John P. Collomosse. 2432-2442 [doi]
- Extending global-local view alignment for self-supervised learning with remote sensing imageryXinye Wanyan, Sachith Seneviratne, Shuchang Shen, Michael Kirley. 2443-2453 [doi]
- RetinaLiteNet: A Lightweight Transformer based CNN for Retinal Feature SegmentationMehwish Mehmood, Majed Alsharari, Shahzaib Iqbal, Ivor Spence, Muhammad Fahim. 2454-2463 [doi]
- ABC-CapsNet: Attention based Cascaded Capsule Network for Audio Deepfake DetectionTaiba Majid Wani, Reeva Gulzar, Irene Amerini. 2464-2472 [doi]
- GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture RecognitionMallika Garg, Debashis Ghosh, Pyari Mohan Pradhan. 2473-2483 [doi]
- Unsupervised Domain Adaptation for Weed Segmentation Using Greedy Pseudo-labellingYingchao Huang, Abdul Bais. 2484-2494 [doi]
- RePoseDM: Recurrent Pose Alignment and Gradient Guidance for Pose Guided Image SynthesisAnant Khandelwal. 2495-2504 [doi]
- Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic ImagesKrishnakant Singh, Thanush Navaratnam, Jannik Holmer, Simone Schaub-Meyer, Stefan Roth 0001. 2505-2515 [doi]
- FloCoDe: Unbiased Dynamic Scene Graph Generation with Temporal Consistency and Correlation DebiasingAnant Khandelwal. 2516-2526 [doi]
- VideoSAGE: Video Summarization with Graph Representation LearningJose M. Rojas Chaves, Subarna Tripathi. 2527-2534 [doi]
- EgoSG: Learning 3D Scene Graphs from Egocentric RGB-D SequencesChaoyi Zhang, Xitong Yang, Ji Hou, Kris Kitani, Weidong Cai 0001, Fu-Jen Chu. 2535-2545 [doi]
- Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation LearningMing Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao. 2546-2555 [doi]
- Segment Anything Model for Road Network Graph ExtractionCongrui Hetang, Haoru Xue, Cindy X. Le, Tianwei Yue, Wenping Wang, Yihui He. 2556-2566 [doi]
- A Review and Efficient Implementation of Scene Graph Generation MetricsJulian Lorenz, Robin Schön, Katja Ludwig, Rainer Lienhart. 2567-2575 [doi]
- SemiGPC: Distribution-Aware Label Refinement for Imbalanced Semi-Supervised Learning Using Gaussian ProcessesAbdelhak Lemkhenter, Manchen Wang, Luca Zancato, Gurumurthy Swaminathan, Paolo Favaro, Davide Modolo. 2576-2585 [doi]
- Uncertainty-based Forgetting Mitigation for Generalized Few-Shot Object DetectionKarim Guirguis, George Eskandar, Mingyang Wang, Matthias Kayser, Eduardo Monari, Bin Yang 0009, Jürgen Beyerer. 2586-2595 [doi]
- Image-caption difficulty for efficient weakly-supervised object detection from in-the-wild dataGiacomo Nebbia, Adriana Kovashka. 2596-2605 [doi]
- Learning Tracking Representations from Single Point AnnotationsQiangqiang Wu, Antoni B. Chan. 2606-2615 [doi]
- CDAD-Net: Bridging Domain Gaps in Generalized Category DiscoverySai Bhargav Rongali, Sarthak Mehrotra, Ankit Jha, Mohamad Hassan N. C, Shirsha Bose, Tanisha Gupta, Mainak Singha, Biplab Banerjee. 2616-2626 [doi]
- Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal ModelsDavid Kurzendörfer, Otniel-Bogdan Mercea, A. Sophia Koepke, Zeynep Akata. 2627-2638 [doi]
- Latent-based Diffusion Model for Long-tailed RecognitionPengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li. 2639-2648 [doi]
- MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic SegmentationFei Pan, Xu Yin, SeokJu Lee, Axi Niu, Sung-Eui Yoon, In-So Kweon. 2649-2658 [doi]
- Active Transferability EstimationTarun Ram Menta, Surgan Jandial, Akash Patil, Saketh Bachu, Vimal K. B., Balaji Krishnamurthy, Vineeth N. Balasubramanian, Mausoom Sarkar, Chirag Agarwal. 2659-2670 [doi]
- What is Point Supervision Worth in Video Instance Segmentation?Shuaiyi Huang, De-An Huang, Zhiding Yu, Shiyi Lan, Subhashree Radhakrishnan, José M. Álvarez 0004, Abhinav Shrivastava, Anima Anandkumar. 2671-2681 [doi]
- UVIS: Unsupervised Video Instance SegmentationShuaiyi Huang, Saksham Suri, Kamal Gupta 0002, Sai Saketh Rambhatla, Ser-Nam Lim, Abhinav Shrivastava. 2682-2692 [doi]
- Open-world Instance Segmentation: Top-down Learning with Bottom-up SupervisionTarun Kalluri, Weiyao Wang 0001, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran. 2693-2703 [doi]
- Weakly-Supervised Temporal Action Localization with Multi-Modal Plateau TransformersXin Hu, Kai Li 0012, Deep Patel, Erik Kruus, Martin Renqiang Min, Zhengming Ding. 2704-2713 [doi]
- On Accuracy and Speed of Geodesic Regression: Do Geometric Priors Improve Learning on Small Datasets?Adele Myers, Nina Miolane. 2714-2722 [doi]
- Human-in-the-Loop Segmentation of Multi-species Coral ImageryScarlett Raine, Ross Marchant, Brano Kusy, Frédéric Maire, Niko Sünderhauf, Tobias Fischer 0001. 2723-2732 [doi]
- Zero-Shot Monocular Motion Segmentation in the Wild by Combining Deep Learning with Geometric Motion Model FusionYuxiang Huang, Yuhao Chen, John Zelek. 2733-2743 [doi]
- Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation FrameworkZhuohong Li, Fangxiao Lu, Jiaqi Zou, Lei Hu, Hongyan Zhang 0001. 2744-2754 [doi]
- Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing DomainSteve Andreas Immanuel, Hagai Raja Sinulingga. 2755-2761 [doi]
- Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot SegmentationShihong Wang, Ruixun Liu, Kaiyu Li, Jiawei Jiang, Xiangyong Cao. 2762-2770 [doi]
- Enrich, Distill and Fuse: Generalized Few-Shot Semantic Segmentation in Remote Sensing Leveraging Foundation Model's AssistanceTianyi Gao, Wei Ao, Xing-Ao Wang, Yuanhao Zhao, Ping Ma, Mengjie Xie, Hang Fu, Jinchang Ren, Zhi Gao. 2771-2780 [doi]
- Dynamic Knowledge Adapter with Probabilistic Calibration for Generalized Few-Shot Semantic SegmentationJintao Tong, Haichen Zhou, Yicong Liu, Yiman Hu, Yixiong Zou. 2781-2790 [doi]
- Localised-NeRF: Specular Highlights and Colour Gradient Localising in NeRFDharmendra Selvaratnam, Dena Bazazian. 2791-2801 [doi]
- Recon3D: High Quality 3D Reconstruction from a Single Image Using Generated Back-View Explicit PriorsRuiyang Chen, Mohan Yin, Jiawei Shen, Wei Ma. 2802-2811 [doi]
- GHNeRF: Learning Generalizable Human Features with Efficient Neural Radiance FieldsArnab Dey, Di Yang 0002, Rohith Agaram, Antitza Dantcheva, Andrew I. Comport, Srinath Sridhar 0002, Jean Martinet. 2812-2821 [doi]
- Analyzing the Internals of Neural Radiance FieldsLukas Radl, Andreas Kurz, Michael Steiner 0011, Markus Steinberger. 2822-2831 [doi]
- Unveiling the Ambiguity in Neural Inverse Rendering: A Parameter Compensation AnalysisGeorgios Kouros, Minye Wu, Sushruth Nagesh, Xianling Zhang, Tinne Tuytelaars. 2832-2841 [doi]
- SAD-GS: Shape-aligned Depth-supervised Gaussian SplattingPou-Chun Kung, Seth Isaacson, Ram Vasudevan, Katherine A. Skinner. 2842-2851 [doi]
- CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face GenerationRahul Dey, Bernhard Egger 0001, Vishnu Naresh Boddeti, Ye Wang 0001, Tim K. Marks. 2852-2861 [doi]
- SLAIM: Robust Dense Neural SLAM for Online Tracking and MappingVincent Cartillier, Grant Schindler, Irfan Essa. 2862-2871 [doi]
- NeRF as Pretraining at Scale: Generalizable 3D-Aware Semantic Representation Learning from View PredictionWenyan Cong, Hanxue Liang, Zhiwen Fan, Peihao Wang, Yifan Jiang 0001, Dejia Xu, A. Cengiz Öztireli, Zhangyang Wang. 2872-2882 [doi]
- Neural Fields for Co-Reconstructing 3D Objects from Incidental 2D DataDylan Campbell, Eldar Insafutdinov, João F. Henriques, Andrea Vedaldi. 2883-2893 [doi]
- Large Language Models in Wargaming: Methodology, Application, and RobustnessYuwei Chen, Shiyong Chu. 2894-2903 [doi]
- Enhancing Targeted Attack Transferability via Diversified Weight PruningHung-Jui Wang, Yu-Yu Wu, Shang-Tse Chen. 2904-2914 [doi]
- Enhancing the Transferability of Adversarial Attacks with Stealth PreservationXinwei Zhang, Tianyuan Zhang 0004, Yitong Zhang, Shuangcheng Liu. 2915-2925 [doi]
- Benchmarking Robustness in Neural Radiance FieldsChen Wang 0049, Angtian Wang, Junbo Li, Alan L. Yuille, Cihang Xie. 2926-2936 [doi]
- Sharpness-Aware Optimization for Real-World Adversarial Attacks for Diverse Compute Platforms with Enhanced TransferabilityMuchao Ye, Xiang Xu, Qin Zhang, Jonathan Wu 0002. 2937-2946 [doi]
- Red-Teaming Segment Anything ModelKrzysztof Jankowski, Bartlomiej Sobieski, Mateusz Kwiatkowski, Jakub Szulc, Michal Janik, Hubert Baniecki, Przemyslaw Biecek. 2947-2956 [doi]
- Learning to Schedule Resistant to Adversarial Attacks in Diffusion Probabilistic Models Under the Threat of Lipschitz SingularitiesSanghwa Hong. 2957-2966 [doi]
- Multimodal Attack Detection for Action Recognition ModelsFurkan Mumcu, Yasin Yilmaz. 2967-2976 [doi]
- Deep Learning-Based Identification of Arctic Ocean Boundaries and Near-Surface Phenomena in Underwater EchogramsFemina Senjaliya, Melissa Cote, Amanda Dash, Alexandra Branzan Albu, Andrea Niemi, Stéphane Gauthier, Julek Chawarski, Steve Pearce, Kaan Ersahin, Keath Borg. 2977-2986 [doi]
- BiMAE - A Bimodal Masked Autoencoder Architecture for Single-Label Hyperspectral Image ClassificationMaksim Kukushkin, Martin Bogdan, Thomas Schmid 0003. 2987-2996 [doi]
- DaFF: Dual Attentive Feature Fusion for Multispectral Pedestrian DetectionAfnan Althoupety, Li-Yun Wang, Wu-chi Feng, Banafsheh Rekabdar. 2997-3006 [doi]
- HNN: Hierarchical Noise-Deinterlace Net Towards Image DenoisingAmogh Joshi, Nikhil Akalwadi, Chinmayee Mandi, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi. 3007-3016 [doi]
- Seeing the Vibration from Fiber-Optic Cables: Rain Intensity Monitoring using Deep Frequency FilteringZhuocheng Jiang, Yangmin Ding, Junhui Zhao, Yue Tian, Shaobo Han, Sarper Ozharar, Ting Wang 0016, James M. Moore. 3017-3026 [doi]
- SwinFuSR: an image fusion-inspired model for RGB-guided thermal image super-resolutionCyprien Arnold, Philippe Jouvet, Lama Seoud. 3027-3036 [doi]
- CAFF-DINO: Multi-spectral object detection transformers with cross-attention features fusionKevin Helvig, Baptiste Abeloos, Pauline Trouvé-Peloux. 3037-3046 [doi]
- Learning Surface Terrain Classifications from Ground Penetrating RadarAnja Sheppard, Jason Brown, Nilton O. Renno, Katherine A. Skinner. 3047-3055 [doi]
- Scattering Prompt Tuning: A Fine-tuned Foundation Model for SAR Object RecognitionWeilong Guo, Shengyang Li, Jian Yang. 3056-3065 [doi]
- MvAV-pix2pixHD: Multi-view Aerial View Image TranslationJun Yu, Keda Lu, Shenshen Du, Lin Xu, Peng Chang, Houde Liu, Bin Lan, Tianyu Liu. 3066-3075 [doi]
- Flexible Window-based Self-attention Transformer in Thermal Image Super-ResolutionHongcheng Jiang, Zhiqiang Chen. 3076-3085 [doi]
- Multi-Scale Feature Fusion using Channel Transformers for Guided Thermal Image Super ResolutionRaghunath Sai Puttagunta, Birendra Kathariya, Zhu Li 0001, George York. 3086-3095 [doi]
- Multi-modal Aerial View Image Challenge: Sensor Domain TranslationSpencer Low, Oliver Nina, Dylan Bowald, Angel Domingo Sappa, Nathan Inkawhich, Peter Bruns. 3096-3104 [doi]
- Multi-modal Aerial View Image Challenge: SAR ClassificationSpencer Low, Oliver Nina, Dylan Bowald, Angel Domingo Sappa, Nathan Inkawhich, Peter Bruns. 3105-3112 [doi]
- Thermal Image Super-Resolution Challenge Results - PBVS 2024Rafael E. Rivadeneira, Angel Domingo Sappa, Chenyang Wang 0002, Junjun Jiang, Zhiwei Zhong, Peilin Chen, Shiqi Wang 0001. 3113-3122 [doi]
- Exploring the usage of diffusion models for thermal image super-resolution: a generic, uncertainty-aware approach for guided and non-guided schemesCarlos Cortés-Mendez, Jean-Bernard Hayet. 3123-3130 [doi]
- Narrowing the Synthetic-to-Real Gap for Thermal Infrared Semantic Image Segmentation Using Diffusion-based Conditional Image SynthesisChristian Mayr 0004, Christian Kübler, Norbert Haala, Michael Teutsch. 3131-3141 [doi]
- Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum ImageryYona Falinie A. Gaus, Neelanjan Bhowmik, Brian K. S. Isaac-Medina, Toby P. Breckon. 3142-3152 [doi]
- Forward-Forward Algorithm for Hyperspectral Image ClassificationAbel A. Reyes Angulo, Sidike Paheding. 3153-3161 [doi]
- Revisiting pre-trained remote sensing model benchmarks: resizing and normalization mattersIsaac Corley, Caleb Robinson, Rahul Dodhia, Juan M. Lavista Ferres, Peyman Najafirad. 3162-3172 [doi]
- Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic DataIvan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Rubén Vera-Rodríguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu 0002, Aythami Morales, Julian Fiérrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw, Cheng-Yaw Low, Hao Liu, Chuyi Wang, Qing Zuo, Zhixiang He, Hatef Otroshi-Shahreza, Anjith George, Alexander Unnervik, Parsa Rahimi, Sébastien Marcel, Pedro C. Neto, Marco Huber, Jan Niklas Kolf, Naser Damer, Fadi Boutros, Jaime S. Cardoso 0001, Ana Filipa Sequeira, Andrea Atzori, Gianni Fenu, Mirko Marras, Vitomir Struc, Jiang Yu, Zhangjie Li, Jichun Li, Weisong Zhao, Zhen Lei 0001, Xiangyu Zhu 0001, Xiao-Yu Zhang, Bernardo Biesseck, Pedro Vidal 0001, Luiz Coelho, Roger Granada, David Menotti. 3173-3183 [doi]
- FineRehab: A Multi-modality and Multi-task Dataset for Rehabilitation AnalysisJianwei Li, Jun Xue, Rui Cao, Xiaoxia Du, Siyu Mo, Kehao Ran, Zeyan Zhang. 3184-3193 [doi]
- Augmenting Pass Prediction via Imitation Learning in Soccer SimulationsTakeshi Kaneko, Rei Kawakami, Takeshi Naemura, Nakamasa Inoue. 3194-3203 [doi]
- Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality AssessmentLauren Okamoto, Paritosh Parmar. 3204-3213 [doi]
- AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot MovementsCalvin C. K. Yeung, Kenjiro Ide, Keisuke Fujii 0001. 3214-3224 [doi]
- Video Interaction Recognition using an Attention Augmented Relational Network and Skeleton DataFarzaneh Askari, Cyril Yared, Rohit Ramaprasad, Devin Garg, Anjun Hu, James J. Clark. 3225-3234 [doi]
- A General Framework for Jersey Number Recognition in Sports VideoMaria Koshkina, James H. Elder. 3235-3244 [doi]
- MV-Soccer: Motion-Vector Augmented Instance Segmentation for Soccer Player TrackingFahad Majeed, Nauman Ullah Gilal, Khaled A. Al-Thelaya, Yin Yang 0001, Marco Agus, Jens Schneider 0002. 3245-3255 [doi]
- Rugby Scene Classification Enhanced by Vision Language ModelNaoki Nonaka, Ryo Fujihira, Toshiki Koshiba, Akira Maeda, Jun Seita. 3256-3266 [doi]
- X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language ModelsJan Held, Hani Itani, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck. 3267-3279 [doi]
- SoccerNet-Depth: a Scalable Dataset for Monocular Depth Estimation in Sports VideosArnaud Leduc, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck. 3280-3282 [doi]
- SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a MinimapVladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, Baptiste Standaert, Amir M. Mansourian, Xin Zhou 0024, Shohreh Kasaei, Bernard Ghanem, Alexandre Alahi, Marc Van Droogenbroeck, Christophe De Vleeschouwer. 3293-3305 [doi]
- Multi-Modal Hit Detection and Positional Analysis in Padel CompetitionsRobbe Decorte, Martin Paré, Jelle Vanhaeverbeke, Joachim Taelman, Maarten Slembrouck, Steven Verstockt. 3306-3314 [doi]
- Pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motionsTomohiro Suzuki, Ryota Tanaka, Kazuya Takeda, Keisuke Fujii 0001. 3315-3324 [doi]
- No Bells, Just Whistles: Sports Field Registration by Leveraging Geometric PropertiesMarc Gutiérrez-Pérez, Antonio Agudo. 3325-3334 [doi]
- A Universal Protocol to Benchmark Camera Calibration for SportsFloriane Magera, Thomas Hoyoux, Olivier Barnich, Marc Van Droogenbroeck. 3335-3346 [doi]
- Table tennis ball spin estimation with an event cameraThomas Gossard, Julian Krismer, Andreas Ziegler 0006, Jonas Tebbe, Andreas Zell. 3347-3356 [doi]
- TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch VideosAtom Scott, Ikuma Uchida, Ning Ding, Rikuhei Umemoto, Rory P. Bunker, Ren Kobayashi, Takeshi Koyama, Masaki Onishi, Yoshinari Kameda, Keisuke Fujii 0001. 3357-3366 [doi]
- Event-based Ball Spin Estimation in SportsTakuya Nakabayashi, Kyota Higa, Masahiro Yamaguchi, Ryo Fujiwara, Hideo Saito. 3367-3375 [doi]
- A stroke of genius: Predicting the next move in badmintonMagnus Ibh, Stella Graßhof, Dan Witzner Hansen. 3376-3385 [doi]
- Beyond the Premier: Assessing Action Spotting Transfer Capability Across Diverse DomainsBruno Cabado, Anthony Cioppa, Silvio Giancola, Andrés Villa, Bertha Guijarro-Berdiñas, Emilio J. Padrón 0001, Bernard Ghanem, Marc Van Droogenbroeck. 3386-3398 [doi]
- Medium Scale Benchmark for Cricket Excited Actions UnderstandingAltaf Hussain, Noman Khan, Muhammad Munsif, Min Je Kim, Sung Wook Baik. 3399-3409 [doi]
- T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports VideosArtur Xarles, Sergio Escalera, Thomas B. Moeslund, Albert Clapés. 3410-3419 [doi]
- PitcherNet: Powering the Moneyball Evolution in Baseball Video AnalyticsJerrin Bright, Bavesh Balaji, Yuhao Chen, David A. Clausi, John S. Zelek. 3420-3429 [doi]
- ExerAIde: AI-assisted Multimodal Diagnosis for Enhanced Sports Performance and Personalised RehabilitationAhmed Qazi, Asim Iqbal. 3430-3438 [doi]
- Look, Listen, and Attack: Backdoor Attacks Against Video Action RecognitionHasan Abed Al Kader Hammoud, Shuming Liu, Mohammed Alkhrashi, Fahad Albalawi, Bernard Ghanem. 3439-3450 [doi]
- Understanding ReLU Network Robustness Through Test Set Certification PerformanceNicola Franco, Jeanette Miriam Lorenz, Karsten Roscher, Stephan Günnemann. 3451-3460 [doi]
- Reliable Trajectory Prediction and Uncertainty Quantification with Conditioned Diffusion ModelsMarion Neumeier, Sebastian Dorn, Michael Botsch, Wolfgang Utschick. 3461-3470 [doi]
- Hinge-Wasserstein: Estimating Multimodal Aleatoric Uncertainty in Regression TasksZiliang Xiong, Arvi Jonnarth, Abdelrahman Eldesokey, Joakim Johnander, Bastian Wandt, Per-Erik Forssén. 3471-3480 [doi]
- AdvDenoise: Fast Generation Framework of Universal and Robust Adversarial Patches Using DenoiseJing Li, Zigan Wang, Jinliang Li. 3481-3490 [doi]
- Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based ExplanationsMaximilian Dreyer, Reduan Achtibat, Wojciech Samek, Sebastian Lapuschkin. 3491-3501 [doi]
- Situation Monitor: Diversity-Driven Zero-Shot Out-of-Distribution Detection using Budding Ensemble Architecture for Object DetectionSyed Sha Qutub, Michael Paulitsch, Kay-Ulrich Scholl, Neslihan Köse Cihangir, Korbinian Hagn, Fabian Oboril, Gereon Hinz, Alois Knoll. 3502-3511 [doi]
- The Penalized Inverse Probability Measure for Conformal ClassificationPaul Melki, Lionel Bombrun, Boubacar Diallo, Jérôme Dias, Jean Pierre Da Costa. 3512-3521 [doi]
- Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation PatternsHakan Yekta Yatbaz, Mehrdad Dianati, Konstantinos Koufos, Roger Woodman. 3522-3531 [doi]
- Reactive Model Correction: Mitigating Harm to Task-Relevant Features via Conditional Bias SuppressionDilyara Bareeva, Maximilian Dreyer, Frederik Pahde, Wojciech Samek, Sebastian Lapuschkin. 3532-3541 [doi]
- Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark StudyPallavi Mitra, Gesina Schwalbe, Nadja Klein. 3542-3552 [doi]
- Towards Weakly-Supervised Domain Adaptation for Lane DetectionJingxing Zhou, Chongzhe Zhang, Jürgen Beyerer. 3553-3563 [doi]
- Towards Engineered Safe AI with Modular Concept ModelsLena Heidemann, Iwo Kurzidem, Maureen Monnet, Karsten Roscher, Stephan Günnemann. 3564-3573 [doi]
- Conformal Semantic Image Segmentation: Post-hoc Quantification of Predictive UncertaintyLuca Mossina, Joseba Dalmau, Léo Andéol. 3574-3584 [doi]
- A Comprehensive Analysis of Factors Impacting Membership InferenceDaniel DeAlcala, Gonzalo Mancera, Aythami Morales, Julian Fiérrez, Ruben Tolosana, Javier Ortega-Garcia. 3585-3593 [doi]
- Exploiting CLIP Self-Consistency to Automate Image Augmentation for Safety Critical ScenariosSujan Sai Gannamaneni, Frederic Klein, Michael Mock, Maram Akila. 3594-3604 [doi]
- Adaptive Memory Replay for Continual LearningJames Seale Smith, Lazar Valkov, Shaunak Halbe, Vyshnavi Gutta, Rogério Feris, Zsolt Kira, Leonid Karlinsky. 3605-3615 [doi]
- Adapting the Segment Anything Model During Usage in Novel SituationsRobin Schön, Julian Lorenz, Katja Ludwig, Rainer Lienhart. 3616-3626 [doi]
- PMAFusion: Projection-Based Multi-Modal Alignment for 3D Semantic Occupancy PredictionShiyao Li, Wenming Yang, Qingmin Liao. 3627-3634 [doi]
- SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial UnderstandingHaoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari. 3635-3647 [doi]
- QAttn: Efficient GPU Kernels for mixed-precision Vision TransformersPiotr Kluska, Adrián Castelló 0001, Florian Scheidegger, A. Cristiano I. Malossi, Enrique S. Quintana-Ortí. 3648-3657 [doi]
- Efficient Transformer Adaptation with Soft Token MergingXin Yuan, Hongliang Fei, Jinoo Baek. 3658-3668 [doi]
- HaLViT: Half of the Weights are EnoughOnur Can Koyun, Behçet Ugur Töreyin. 3669-3678 [doi]
- Parameter Efficient Fine-tuning of Self-supervised ViTs without Catastrophic ForgettingReza Akbarian Bafghi, Nidhin Harilal, Claire Monteleoni, Maziar Raissi. 3679-3684 [doi]
- Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable SensorYuning Huang, M. A Hassan, Jiangpeng He, Janine A. Higgins, Megan A. Mccrory, Heather A. Eicher-Miller, J. Graham Thomas, Edward Sazonov, Fengqing Zhu 0001. 3685-3694 [doi]
- Learning to Classify New Foods Incrementally Via Compressed ExemplarsJustin Yang, Zhihao Duan, Jiangpeng He, Fengqing Zhu 0001. 3695-3704 [doi]
- MP-PolarMask: A Faster and Finer Instance Segmentation for Concave ImagesKe-Lei Wang, Pin-Hsuan Chou, Young-Ching Chou, Chia-Jen Liu, Cheng-Kuan Lin, Yu-Chee Tseng. 3705-3714 [doi]
- Segment Anything in Food ImagesSaeed S. Alahmari, Michael Gardner, Tawfiq Salem. 3715-3720 [doi]
- Shape-Preserving Generation of Food Images for Automatic Dietary AssessmentGuangzong Chen, Zhi-Hong Mao, Mingui Sun, Kangni Liu, Wenyan Jia. 3721-3731 [doi]
- A Generative Exploration of Cuisine TransferPhilip Wootaek Shin, Ajay Narayanan Sridhar, Jack Sampson, Vijaykrishnan Narayanan. 3732-3740 [doi]
- Food Portion Estimation via 3D Object ScalingGautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu 0001. 3741-3749 [doi]
- LOFI: LOng-tailed FIne-Grained Network for Food RecognitionJesús M. Rodríguez-de-Vera, Imanol G. Estepa, Marc Bolaños, Bhalaji Nagarajan, Petia Radeva. 3750-3760 [doi]
- How Much You Ate? Food Portion Estimation on SpoonsAaryam Sharma, Chris Czarnecki, Yuhao Chen, Pengcheng Xi, Linlin Xu, Alexander Wong. 3761-3770 [doi]
- Faster Than Lies: Real-time Deepfake Detection using Binary Neural NetworksRomeo Lanzino, Federico Fontana, Anxhelo Diko, Marco Raoul Marini, Luigi Cinque. 3771-3780 [doi]
- Latent Flow Diffusion for Deepfake Video GenerationAashish Chandra K, Aashutosh A V, Srijan Das, Abhijit Das 0001. 3781-3790 [doi]
- Deepfake Catcher: Can a Simple Fusion be Effective and Outperform Complex DNNs?Akshay Agarwal 0001, Nalini K. Ratha. 3791-3801 [doi]
- DiffSeg: Towards Detecting Diffusion-Based Inpainting Attacks Using Multi-Feature SegmentationRaphael Antonius Frick, Martin Steinebach. 3802-3808 [doi]
- PUDD: Towards Robust Multi-modal Prototype-based Deepfake DetectionAlvaro Lopez Pellcier, Yi Li, Plamen Angelov 0001. 3809-3817 [doi]
- Demographic Bias Effects on Face Image SynthesisRoberto Leyva, Victor Sanchez, Gregory Epiphaniou, Carsten Maple. 3818-3826 [doi]
- Evaluating the Integration of Morph Attack Detection in Automated Face Recognition SystemsAndrea Panzino, Simone Maurizio La Cava, Giulia Orrù, Gian Luca Marcialis. 3827-3836 [doi]
- Temporal surface frame anomalies for deepfake video detectionAndrea Ciamarra, Roberto Caldelli, Alberto Del Bimbo. 3837-3844 [doi]
- Quality-based Artifact Modeling for Facial Deepfake Detection in VideosSara Concas, Simone Maurizio La Cava, Roberto Casula, Giulia Orrù, Giovanni Puglisi, Gian Luca Marcialis. 3845-3854 [doi]
- MaskSim: Detection of synthetic images by masked spectrum similarity analysisYanhao Li, Quentin Bammey, Marina Gardella, Tina Nikoukhah, Jean-Michel Morel, Miguel Colom, Rafael Grompone von Gioi. 3855-3865 [doi]
- Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled EnsembleBlaz Rolih, Dick Ameln, Ashwin Vaidya, Samet Akcay. 3866-3875 [doi]
- Omni-Crack30k: A Benchmark for Crack Segmentation and the Reasonable Effectiveness of Transfer LearningChristian Benz, Volker Rodehorst. 3876-3886 [doi]
- Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified ApproachAyush K. Rai, Tarun Krishna, Feiyan Hu, Alexandru Drimbarean, Kevin McGuinness, Alan F. Smeaton, Noel E. O'Connor. 3887-3899 [doi]
- Blind Localization and Clustering of Anomalies in TexturesAndrei-Timotei Ardelean, Tim Weyrich. 3900-3909 [doi]
- Test Time Training for Industrial Anomaly SegmentationAlex Costanzino, Pierluigi Zama Ramirez, Mirko Del Moro, Agostino Aiezzo, Giuseppe Lisanti, Samuele Salti, Luigi di Stefano. 3910-3920 [doi]
- TAB: Text-Align Anomaly Backbone Model for Industrial Inspection TasksHo-Weng Lee, Shang-Hong Lai. 3921-3929 [doi]
- Tri-VAE: Triplet Variational Autoencoder for Unsupervised Anomaly Detection in Brain Tumor MRIHansen Wijanarko, Evelyne Calista, Li-Fen Chen, Yong-Sheng Chen. 3930-3939 [doi]
- Dynamic Addition of Noise in a Diffusion Model for Anomaly DetectionJustin Tebbe, Jawad Tayyub. 3940-3949 [doi]
- SplatPose & Detect: Pose-Agnostic 3D Anomaly DetectionMathis Kruse, Marco Rudolph, Dominik Woiwode, Bodo Rosenhahn. 3950-3960 [doi]
- Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly DetectionDemetris Lappas, Vasileios Argyriou, Dimitrios Makris 0001. 3961-3970 [doi]
- COOD: Combined out-of-distribution detection using multiple measures for anomaly & novel class detection in large-scale hierarchical classificationLaurens E. Hogeweg, Rajesh Gangireddy, Django Brunink, Vincent J. Kalkman, Ludo Cornelissen, Jacob W. Kamminga. 3971-3980 [doi]
- Model-guided contrastive fine-tuning for industrial anomaly detectionAitor Artola, Yannis Kolodziej, Jean-Michel Morel, Thibaud Ehret. 3981-3991 [doi]
- Tracklet-based Explainable Video Anomaly LocalizationAshish Singh, Michael J. Jones 0001, Erik G. Learned-Miller. 3992-4001 [doi]
- Context-aware Video Anomaly Detection in Long-Term DatasetsZhengye Yang, Richard J. Radke. 4002-4011 [doi]
- Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label NoiseFahimeh Fooladgar, Minh Nguyen Nhat To, Parvin Mousavi, Purang Abolmaesumi. 4012-4021 [doi]
- LogicAL: Towards logical anomaly synthesis for unsupervised anomaly localizationYing Zhao. 4022-4031 [doi]
- DMR: Disentangling Marginal Representations for Out-of-Distribution DetectionDasol Choi, Dongbin Na. 4032-4041 [doi]
- BMAD: Benchmarks for Medical Anomaly DetectionJinan Bao, Hanshi Sun, Hanqiu Deng, Yinsheng He, Zhaoxiang Zhang 0003, Xingyu Li. 4042-4053 [doi]
- DELTA: Decoupling Long-Tailed Online Continual LearningSiddeshwar Raghavan, Jiangpeng He, Fengqing Zhu 0001. 4054-4064 [doi]
- Unveiling the Anomalies in an Ever-Changing World: A Benchmark for Pixel-Level Anomaly Detection in Continual LearningNikola Bugarin, Jovana Bugaric, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto. 4065-4074 [doi]
- Calibrating Higher-Order Statistics for Few-Shot Class-Incremental Learning with Pre-trained Vision TransformersDipam Goswami, Bartlomiej Twardowski, Joost van de Weijer 0001. 4075-4084 [doi]
- Active Data Collection and Management for Real-World Continual Learning via Pretrained OracleVivek Chavan, Paul Koch, Marian Schlüter, Clemens Briese, Jörg Krüger. 4085-4096 [doi]
- Class-Incremental Mixture of Gaussians for Deep Continual LearningLukasz Korycki, Bartosz Krawczyk. 4097-4106 [doi]
- MultIOD: Rehearsal-free Multihead Incremental Object DetectorEden Belouadah, Arnaud Dapogny, Kevin Bailly. 4107-4117 [doi]
- Wake-Sleep Energy Based Models for Continual LearningVaibhav Singh, Anna Choromanska, Shuang Li 0013, Yilun Du. 4118-4127 [doi]
- Continual-Zoo: Leveraging Zoo Models for Continual Classification of Medical ImagesNourhan Bayasi, Ghassan Hamarneh, Rafeef Garbi. 4128-4138 [doi]
- TAME: Task Agnostic Continual Learning using Multiple ExpertsHaoran Zhu, Maryam Majzoubi, Arihant Jain, Anna Choromanska. 4139-4148 [doi]
- Tackling Domain Shifts in Person Re-Identification: A Survey and AnalysisVuong D. Nguyen, Samiha Mirza, Abdollah Zakeri, Ayush Gupta, Khadija Khaldi, Rahma Aloui, Pranav Mantini, Shishir K. Shah, Fatima A. Merchant. 4149-4159 [doi]
- Calibration of Continual Learning ModelsLanpei Li, Elia Piccoli, Andrea Cossu, Davide Bacciu, Vincenzo Lomonaco. 4160-4169 [doi]
- VLM-PL: Advanced Pseudo Labeling approach for Class Incremental Object Detection via Vision-Language ModelJunsu Kim, Yunhoe Ku, Jihyeon Kim, Junuk Cha, SeungRyul Baek. 4170-4181 [doi]
- The Expanding Scope of the Stability Gap: Unveiling its Presence in Joint Incremental Learning of Homogeneous TasksSandesh Kamath, Albin Soutif-Cormerais, Joost van de Weijer 0001, Bogdan Raducanu. 4182-4186 [doi]
- Continual Learning with Weight InterpolationJedrzej Kozal, Jan Wasilewski, Bartosz Krawczyk, Michal Wozniak 0001. 4187-4195 [doi]
- An analysis of best-practice strategies for replay and rehearsal in continual learningAlexander Krawczyk, Alexander Gepperth. 4196-4204 [doi]
- FedProK: Trustworthy Federated Class-Incremental Learning via Prototypical Feature Knowledge TransferXin Gao, Xin Yang 0012, Hao Yu, Yan Kang 0001, Tianrui Li 0001. 4205-4214 [doi]
- Collaborative Visual Place Recognition through Federated LearningMattia Dutto, Gabriele Moreno Berton, Debora Caldarola, Eros Fanì, Gabriele Trivigno, Carlo Masone. 4215-4225 [doi]
- On the Efficiency of Privacy Attacks in Federated LearningNawrin Tabassum, Ka Ho Chow, Xuyu Wang, Wenbin Zhang 0002, Yanzhao Wu 0001. 4226-4235 [doi]
- Federated Hyperparameter Optimization through Reward-Based Strategies: Challenges and InsightsKrishna Kanth Nakka, Ahmed Frikha 0002, Ricardo Mendes, Xue Jiang, Xuebing Zhou. 4236-4244 [doi]
- DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint DetectorJohan Edstedt, Georg Bökman, Zhenjun Zhao. 4245-4253 [doi]
- Affine-based Deformable Attention and Selective Fusion for Semi-dense Matchinghongkai Chen, Zixin Luo, Yurun Tian, Xuyang Bai, Ziyu Wang, Lei Zhou, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan. 4254-4263 [doi]
- EarthMatch: Iterative Coregistration for Fine-grained Localization of Astronaut PhotographyGabriele Moreno Berton, Gabriele Goletto, Gabriele Trivigno, Alex Stoken, Barbara Caputo, Carlo Masone. 4264-4274 [doi]
- XoFTR: Cross-modal Feature Matching TransformerÖnder Tuzcuoglu, Aybora Köksal, Bugra Sofu, Sinan Kalkan, A. Aydin Alatan. 4275-4286 [doi]
- Are Deep Learning Models Pre-trained on RGB Data Good Enough for RGB-Thermal Image Retrieval?Amulya Pendota, Sumohana S. Channappayya. 4287-4296 [doi]
- Finding AI-Generated Faces in the WildGonzalo J. Aniano Porcile, Jack Gindi, Shivansh Mundra, James R. Verbus, Hany Farid. 4297-4305 [doi]
- An Investigation into the Impact of AI-Powered Image Enhancement on Forensic Facial RecognitionJustin Norman, Hany Farid. 4306-4314 [doi]
- Lost in Translation: Lip-Sync Deepfake Detection from Audio-Video MismatchMatyas Bohacek, Hany Farid. 4315-4323 [doi]
- Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media ForensicsShan Jia, Reilin Lyu, Kangran Zhao, Yize Chen, Zhiyuan Yan 0002, Yan Ju, Chuanbo Hu, Xin Li 0005, Baoyuan Wu, Siwei Lyu. 4324-4333 [doi]
- E3: Ensemble of Expert Embedders for Adapting Synthetic Image Detectors to New Generators Using Limited DataAref Azizpour, Tai D. Nguyen, Manil Shrestha, Kaidi Xu, Edward Kim, Matthew C. Stamm. 4334-4344 [doi]
- Fusion Transformer with Object Mask Guidance for Image Forgery AnalysisDimitrios Karageorgiou, Giorgos Kordopatis-Zilos, Symeon Papadopoulos. 4345-4355 [doi]
- Raising the Bar of AI-generated Image Detection with CLIPDavide Cozzolino, Giovanni Poggi, Riccardo Corvi, Matthias Nießner, Luisa Verdoliva. 4356-4366 [doi]
- StampOne: Addressing Frequency Balance in Printer-proof SteganographyFarhad Shadmand, Iurii Medvedev, Luiz Schirmer, João Marcos 0002, Nuno Gonçalves 0001. 4367-4376 [doi]
- Building Secure and Engaging Video Communication by Using Monitor IlluminationJun Myeong Choi, Johnathan Leung, Noah Frahm, Max Christman, Gedas Bertasius, Roni Sengupta. 4377-4386 [doi]
- Audio Provenance Analysis in Heterogeneous Media SetsMilica Gerhardt, Luca Cuccovillo, Patrick Aichroth. 4387-4396 [doi]
- Beyond Deepfake Images: Detecting AI-Generated VideosDanial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm. 4397-4408 [doi]
- Audio Transformer for Synthetic Speech Detection via Multi-Formant AnalysisLuca Cuccovillo, Milica Gerhardt, Patrick Aichroth. 4409-4417 [doi]
- FairSSD: Understanding Bias in Synthetic Speech DetectorsAmit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J. Delp. 4418-4428 [doi]
- Beyond the Screen: Evaluating Deepfake Detectors under Moiré Pattern EffectsRazaib Tariq, Minji Heo, Simon S. Woo, Shahroz Tariq. 4429-4439 [doi]
- Do More With What You Have: Transferring Depth-Scale from Labeled to Unlabeled DomainsAlexandra Dana, Nadav Carmel, Amit Shomer, Ofer Manela, Tomer Peleg. 4440-4450 [doi]
- CenterPoint Transformer for BEV Object Detection with Automotive RadarLoveneet Saini, Yu Su, Hasan Tercan, Tobias Meisen. 4451-4460 [doi]
- Are NeRFs ready for autonomous driving? Towards closing the real-to-simulation gapCarl Lindström, Georg Hess, Adam Lilja, Maryam Fatemi, Lars Hammarstrand, Christoffer Petersson, Lennart Svensson. 4461-4471 [doi]
- Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic EnvironmentsBenoît Gérin, Anaïs Halin, Anthony Cioppa, Maxim Henry, Bernard Ghanem, Benoît Macq, Christophe De Vleeschouwer, Marc Van Droogenbroeck. 4472-4482 [doi]
- TrajFine: Predicted Trajectory Refinement for Pedestrian Trajectory ForecastingKuan-Lin Wang, Li-Wu Tsao, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng. 4483-4492 [doi]
- OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation NetworksSophia Sirko-Galouchenko, Alexandre Boulch, Spyros Gidaris, Andrei Bursuc, Antonín Vobecký, Patrick Pérez, Renaud Marlet. 4493-4503 [doi]
- Potential Risk Localization via Weak Labeling out of Blind SpotKota Shimomura, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi. 4504-4513 [doi]
- Click, Crop & Detect: One-Click Offline Annotation for Human-in-the-Loop 3D Object Detection on Point CloudsNitin Kumar Saravana Kannan, Matthias Reuse, Martin Simon. 4514-4525 [doi]
- Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformersJames Gunn, Zygmunt Lenyk, Anuj Sharma, Andrea Donati, Alexandru Buburuzan, John Redford, Romain Mueller. 4526-4536 [doi]
- DuST: Dual Swin Transformer for Multi-modal Video and Time-Series ModelingLiang Shi, Yixin Chen, Meimei Liu, Feng Guo. 4537-4546 [doi]
- TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic SegmentationRong Li, Shijie Li, Xieyuanli Chen, Teli Ma, Juergen Gall, Junwei Liang 0001. 4547-4556 [doi]
- CaBins: CLIP-based Adaptive Bins for Monocular Depth EstimationEunjin Son, Sang-Jun Lee. 4557-4567 [doi]
- Exploring Real World Map Change Generalization of Prior-Informed HD Map Prediction ModelsSamuel M. Bateman, Ning Xu, H. Charles Zhao, Yael Ben Shalom, Vince Gong, Greg Long, Will Maddern. 4568-4578 [doi]
- MULi-Ev: Maintaining Unperturbed LiDAR-Event CalibrationMathieu Cocheteux, Julien Moreau 0001, Franck Davoine. 4579-4586 [doi]
- The 6th Affective Behavior Analysis in-the-wild (ABAW) CompetitionDimitrios Kollias, Panagiotis Tzirakis, Alan Cowen, Stefanos Zafeiriou, Irene Kotsia, Alice Baird, Chris Gagne 0001, Chunchang Shao, Guanyu Hu. 4587-4598 [doi]
- Unsupervised Multi-Person 3D Human Pose Estimation From 2D Poses AlonePeter Hardy, Hansung Kim. 4599-4603 [doi]
- Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression RecognitionMarah Halawa, Florian Blume, Pia Bideau, Martin Maier, Rasha Abdel Rahman, Olaf Hellwich. 4604-4614 [doi]
- Purposeful Regularization with Reinforcement Learning for Facial Expression Recognition In-the-WildSanghwa Hong. 4615-4624 [doi]
- Joint Multimodal Transformer for Emotion Recognition in the WildPaul Waligora, Muhammad Haseeb Aslam, Muhammad Osama Zeeshan, Soufiane Belharbi, Alessandro Lameiras Koerich, Marco Pedersoli, Simon Bacon, Eric Granger. 4625-4635 [doi]
- CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality LabelsChi-Hsuan Wu, Shih-Yang Liu, Xijie Huang, Xingbo Wang, Rong Zhang, Luca Minciullo, Wong Kai Yiu, Kenny Kwan, Kwang-Ting Cheng. 4636-4645 [doi]
- 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN RefinementFilipa Lino, Carlos Santiago, Manuel Marques. 4646-4656 [doi]
- Unimodal Multi-Task Fusion for Emotional Mimicry Intensity PredictionTobias Hallmen, Fabian Deuser, Norbert Oswald, Elisabeth André. 4657-4665 [doi]
- Enhancing Emotion Recognition with Pre-trained Masked Autoencoders and Sequential LearningWeiwei Zhou, Jiada Lu, Chenkun Ling, Weifeng Wang, Shaowei Liu. 4666-4672 [doi]
- MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wildKateryna Chumachenko, Alexandros Iosifidis, Moncef Gabbouj. 4673-4682 [doi]
- CAGE: Circumplex Affect Guided Expression InferenceNiklas Wagner, Felix Mätzler, Samed Rouven Vossberg, Helen Schneider, Svetlana Pavlitska, J. Marius Zöllner. 4683-4692 [doi]
- Video Representation Learning for Conversational Facial Expression Recognition Guided by Multiple View ReconstructionValeriya Strizhkova, Laura M. Ferrari, Hadi Kachmar, Antitza Dantcheva, François Brémond. 4693-4702 [doi]
- Leveraging Pre-trained Multi-task Deep Models for Trustworthy Facial Analysis in Affective Behaviour Analysis in-the-WildAndrey V. Savchenko. 4703-4712 [doi]
- Drone-HAT: Hybrid Attention Transformer for Complex Action Recognition in Drone Surveillance VideosMustaqeem Khan 0001, Jamil Ahmad, Abdulmotaleb El-Saddik, Wail Gueaieb, Giulia De Masi, Fakhri Karray. 4713-4722 [doi]
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature SignalsAlexander Vedernikov, Puneet Kumar 0003, Haoyu Chen 0001, Tapio Seppänen, Xiaobai Li. 4723-4732 [doi]
- Learning Transferable Compound Expressions from Masked AutoEncoder PretrainingFeng Qiu, Heming Du, Wei Zhang, Chen Liu, Lincheng Li, Tianchen Guo, Xin Yu. 4733-4741 [doi]
- Language-guided Multi-modal Emotional Mimicry Intensity EstimationFeng Qiu, Wei Zhang, Chen Liu, Lincheng Li, Heming Du, Tianchen Guo, Xin Yu. 4742-4751 [doi]
- Zero-Shot Audio-Visual Compound Expression Recognition Method based on Emotion Probability FusionElena Ryumina, Maxim Markitantov, Dmitry Ryumin, Heysem Kaya, Alexey Karpov 0001. 4752-4760 [doi]
- An Effective Ensemble Learning Framework for Affective Behaviour AnalysisWei Zhang, Feng Qiu, Chen Liu, Lincheng Li, Heming Du, Tianchen Guo, Xin Yu. 4761-4772 [doi]
- Multi-modal Arousal and Valence Estimation under Noisy ConditionsDenis Dresvyanskiy, Maxim Markitantov, Jiawei Yu, Heysem Kaya, Alexey Karpov 0001. 4773-4783 [doi]
- Emotic Masked Autoencoder on Dual-views with Attention Fusion for Facial Expression RecognitionXuan-Bach Nguyen, Hoang-Thien Nguyen, Thanh Huy Nguyen, Nhu-Tai Do, Quang Vinh Dinh. 4784-4792 [doi]
- REFA: Real-time Egocentric Facial Animations for Virtual RealityQiang Zhang 0008, Tong Xiao, Haroun Habeeb, Larissa Laich, Sofien Bouaziz, Patrick Snape, Wenjing Zhang, Matthew Cioffi, Peizhao Zhang, Pavel Pidlypenskyi, Winnie Lin, Luming Ma, Mengjiao Wang 0002, Kunpeng Li, Chengjiang Long, Steven Song, Martin Prazák, Alexander Sjoholm, Ajinkya Deogade, Jaebong Lee, Julio Delgado Mangas, Amaury Aubel. 4793-4802 [doi]
- Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion RecognitionR. Gnana Praveen, Jahangir Alam 0001. 4803-4813 [doi]
- AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual ContextsJun Yu, Zerui Zhang, Zhihong Wei, Gongpeng Zhao, Zhongpeng Cai, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu, Qingsong Liu, Jiaen Liang. 4814-4821 [doi]
- Uncovering Hidden Emotions with Adaptive Multi-Attention Graph NetworksAnkith Jain Rakesh Kumar, Bir Bhanu. 4822-4831 [doi]
- Evaluating the Effectiveness of Video Anomaly Detection in the Wild Online Learning and Inference for Real-world DeploymentShanle Yao, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi. 4832-4841 [doi]
- Unravelling Robustness of Deep Face Recognition Networks Against Illicit Drug Abuse ImagesHruturaj Dhake, Akshay Agarwal 0001. 4842-4848 [doi]
- EmotiEffNet and Temporal Convolutional Networks in Video-based Facial Expression Recognition and Action Unit DetectionAndrey V. Savchenko, Anna P. Sidorova. 4849-4859 [doi]
- Emotion Recognition Using Transformers with Random MaskingSeongjae Min, Junseok Yang, Sejoon Lim. 4860-4865 [doi]
- Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity EstimationJun Yu, Wangyuan Zhu, Jichao Zhu, Zhongpeng Cai, Gongpeng Zhao, Zerui Zhang, Guochen Xie, Zhihong Wei, Qingsong Liu, Jiaen Liang. 4866-4872 [doi]
- Multi Model Ensemble for Compound Expression RecognitionJun Yu, Jichao Zhu, Wangyuan Zhu, Zhongpeng Cai, Gongpeng Zhao, Zhihong Wei, Guochen Xie, Zerui Zhang, Qingsong Liu, Jiaen Liang. 4873-4879 [doi]
- Exploring Facial Expression Recognition through Semi-Supervised Pre-training and Temporal ModelingJun Yu, Zhihong Wei, Zhongpeng Cai, Gongpeng Zhao, Zerui Zhang, Yongqi Wang, Guochen Xie, Jichao Zhu, Wangyuan Zhu, Qingsong Liu, Jiaen Liang. 4880-4887 [doi]
- CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive AttentionDamith Chamalke Senadeera, Xiaoyun Yang, Dimitrios Kollias, Gregory G. Slabaugh. 4888-4897 [doi]
- One class classification-based quality assurance of organs-at-risk delineation in radiotherapyYihao Zhao, Cuiyun Yuan, Ying Liang, Yang Li, Chunxia Li, Man Zhao, Jun Hu, Ningze Zhong, Chenbin Liu. 4898-4906 [doi]
- Domain adaptation, Explainability & Fairness in AI for Medical Image Analysis: Diagnosis of COVID-19 based on 3-D Chest CT-scansDimitrios D. Kollias, Anastasios Arsenos, Stefanos Kollias. 4907-4914 [doi]
- Comparative Analysis of Generalization and Harmonization Methods for 3D Brain fMRI Images: A Case Study on OpenBHB DatasetSoroosh Safari Loaliyan, Greg Ver Steeg. 4915-4923 [doi]
- A Closer Look at Spatial-Slice Features Learning for COVID-19 DetectionChih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai. 4924-4934 [doi]
- Interpreting COVID Lateral Flow Tests' Results with Foundation ModelsStuti Pandey, Josh Myers-Dean, Jarek Reynolds, Danna Gurari. 4935-4942 [doi]
- Fetal ECG Extraction on Time-Frequency Domain using Conditional GANVuong D. Nguyen. 4943-4949 [doi]
- Focusing on What Matters: Fine-grained Medical Activity Recognition for Trauma Resuscitation via Actor TrackingWenjin Zhang, Keyi Li, Sen Yang 0002, Sifan Yuan, Ivan Marsic, Genevieve J. Sippel, Mary S. Kim, Randall S. Burd. 4950-4958 [doi]
- How SAM Perceives Different mp-MRI Brain Tumor Domains?Cecilia Diana-Albelda, Roberto Alcover-Couso, Álvaro García-Martín, Jesús Bescós. 4959-4970 [doi]
- LaPA: Latent Prompt Assist Model for Medical Visual Question AnsweringTiancheng Gu, Kaicheng Yang, Dongnan Liu, Weidong Cai 0001. 4971-4980 [doi]
- SegFormer3D: an Efficient Transformer for 3D Medical Image SegmentationShehan Perera, Pouyan Navard, Alper Yilmaz. 4981-4988 [doi]
- PP-SAM: Perturbed Prompts for Robust Adaption of Segment Anything Model for Polyp SegmentationMd Mostafijur Rahman, Mustafa Munir, Debesh Jha, Ulas Bagci, Radu Marculescu. 4989-4995 [doi]
- Using Counterfactual Information for Breast Classification DiagnosisMiguel Cardoso,