Abstract is missing.
- Vision Projector: Improving Zero-Shot Composed Image Retrieval at InferenceHoang Bao Le, Allie Tran, Binh T. Nguyen 0001, Liting Zhou, Cathal Gurrin. 1-5 [doi]
- Predicting Moral Values in Lyrics Through AudioCharalampos Saitis, Ben Heyderman, Vjosa Preniqi, Kyriaki Kalimeri, Johan Pauwels. 1-7 [doi]
- Label-Efficient Skeleton-Based Recognition with Stable Graph ConvnetsHichem Sahbi. 1-8 [doi]
- Evaluating the Recognisability of AI-Generated Familiar Images in a Closed Environment with a Gamified ApproachMarc Gallofré Ocaña, Balázs Mosolygó, Bahareh Fatemi. 1-7 [doi]
- NN Watermarking for Face Segmentation TaskCarl De Sousa Trias, Mihai Mitrea. 1-8 [doi]
- Towards Graph-Based Federated Learning: ModelNet - A ResNet-based Model Classification DatasetAbhisek Ray, Lukas Esterle. 1-7 [doi]
- Zero-Shot Vision-Language Model for Event Detection in Smart SurveillanceYounes Kebour, Smaïl Niar, Nacim Ihaddadene, Abdelghani Bekrar, Hammouda Elbez. 1-8 [doi]
- EcoStream: A Resource Utilization and Power Consumption Dataset in Multimedia Streaming for Sustainability AnalysisTariq Al Shoura, Reza Razavi, Mohammad Moshirpour. 1-8 [doi]
- MERCI: A Multimodal Dataset for Personalised and Emotionally-Aware DialoguesMohammed Althubyani, Zhijin Meng, Shengyuan Xie, Francisco Cruz 0002, Imran Razzak, Mukesh Prasad, Eduardo B. Sandoval, A. Baki Kocaballi. 1-7 [doi]
- Hockey2D: A Keypoint-Based Framework for Ice Hockey Rink Localization and Object MappingMehdi Houshmand Sarkhoosh, Cise Midoglu, Saeed Shafiee Sabet, Tomas Kupka, Pål Halvorsen. 1-7 [doi]
- MSS: A Multilingual Spoofed Speech Dataset with Code-Switching for Anti-Spoofing MeasuresMuhammad Hamza, Hafsa Ilyas, Junaid Mir, Ali Javed, Muhammad Haroon Yousaf, Ahmed Zoha. 1-7 [doi]
- MultiHuSE: A Multimodal Dataset for Humour Styles and EmotionsMary Ogbuka Kenneth, Foaad Khosmood, Abbas Edalat. 1-7 [doi]
- Text-Oriented Image Query Representation for Zero-Shot Composed Image RetrievalPavan K. Rachabathuni, Andrea Ciamarra, Roberto Caldelli, Marco Bertini 0001. 1-7 [doi]
- GeMix: Conditional GAN-Based Mixup for Improved Medical Image AugmentationHugo Carlesso, Maria Eliza Patulea, Moncef Garouani, Radu-Tudor Ionescu, Josiane Mothe. 1-7 [doi]
- Rethinking Wine Tasting for Chinese Consumers: A Service Design Approach Enhanced by Multimodal PersonalizationXinyang Shan, Yuanyuan Xu, Tian Xia, Yin-Shan Lin. 1-5 [doi]
- A Comparative Study of Conversational and Conventional Search Methods for Image RetrievalAnastasiia Potiagalova, Joemon J. Jose, Benjamin R. Cowan, Gareth J. F. Jones. 1-7 [doi]
- Dual-Objective Adversarial Disentanglement for Protecting Speech Data used for Diagnosing Parkinson's DiseaseMehtab Ur Rahman, Martha A. Larson, Louis ten Bosch, Cristian Tejedor-Garcia. 1-6 [doi]
- Does CLIP Perceive Art the Same Way We Do?Andrea Asperti, Leonardo Dessì, Maria Chiara Tonetti, Nico Wu. 1-8 [doi]
- Lip Reading Across Languages: A Cross-Modal Framework Leveraging Foundation ModelsRuxandra Tapu, Bogdan Mocanu. 1-7 [doi]
- FPN-Based Multi-Scale Feature Fusion for Robust 3D Pedestrian Detection in Crowded ScenesKiyotaka Matsue, Kenta Umene, Nghia Dao, Hieu Nguyen, Manh Phan. 1-7 [doi]
- Mitigating Shortcut Learning in Online Action Detection and Anticipation via Cross-Modal Semantic AlignmentSensen Wang, Yuehu Liu, Chi Zhang. 1-7 [doi]
- A Survey of Information Disorder on Video-Sharing PlatformsMeiyu Li, Wei Ai, Naeemul Hassan. 1-10 [doi]
- Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data RegimesNirmal Elamon, Rouzbeh Davoudi. 1-6 [doi]
- Anonymisation of Visual Lifelogs using Diffusion Models and Large Language ModelsMinh Quang Le, Graham Healy, Liting Zhou, Cathal Gurrin. 1-7 [doi]
- Personalizing Retrieval Using Joint Embeddings; or "the Return of Fluffy"Bruno Korbar, Andrew Zisserman. 1-8 [doi]
- Multi-modal Context Reranking for Lifelog Question AnsweringQuang-Linh Tran, Ly-Duyen Tran, Binh T. Nguyen 0001, Gareth J. F. Jones, Cathal Gurrin. 1-8 [doi]
- Music4All A+A: A Multimodal Dataset for Music Information Retrieval TasksJonas Geiger, Marta Moscati, Shah Nawaz, Markus Schedl. 1-7 [doi]
- An Experimental Study on Generating Plausible Textual Explanations for Video SummarizationThomas Eleftheriadis, Evlampios Apostolidis, Vasileios Mezaris. 1-8 [doi]
- Dialogue-AV: A Dialogue-Attended Audiovisual DatasetLuís Vilaça, Paula Viana, Yi Yu 0001. 1-8 [doi]
- Fusion of Global and Local Features with Multi-Inverted Indices for Image RetrievalLi Weng, Xizhe Wang, Qianneng Wang, Bingya Wu. 1-8 [doi]
- ELIP: Enhanced Visual-Language Foundation Models for Image RetrievalGuanqi Zhan, Yuanpei Liu, Kai Han 0001, Weidi Xie, Andrew Zisserman. 1-8 [doi]
- Masked Spikformer: Gaussian based and Random Spike Masking for Energy-Efficient Spiking TransformersOumaima Marsi, Sebastien Ambellouis, José Mennesson, Cyril Meurie, Anthony Fleury, Charles Tatkeu. 1-7 [doi]
- TREB: Temporal Refinement of Egocentric Body PoseBruno Henriques, Benjamin Allaert, Nicolas Sutton-Charani, Pierre Slangen, Jean-Philippe Vandeborre. 1-6 [doi]
- BandNaviHD: Band-Member Backtrack Interface Based on Member History InformationMasatoshi Hamanaka. 1-4 [doi]
- Robust Multimedia Verification of Cheapfakes and Deepfakes via External Context LeveragingMinh Nhat Nguyen, Trong-Nghia Tran, Minh-Triet Tran, Duc-Tien Dang-Nguyen, Trong-Le Do. 1-8 [doi]
- GTR: General Handwritten Lines Text Recognition DatasetXu Ji, Haizhao Sun, Yu Ning, Ming Wu 0001, Chuang Zhang. 1-7 [doi]
- A New Pipeline for Extracting and Clustering Sub-Images from Unannotated Complex Image DatasetsChafic Abou Akar, Christian Beddawi, Marc Kamradt, Abdallah Makhoul. 1-7 [doi]
- SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video ModelsZhengxu Tang, Zizheng Wang, Luning Wang, Zitao Shuai, Chenhao Zhang, Siyu Qian, Yirui Wu, Bohao Wang, Haosong Rao, Zhenyu Yang, Chenwei Wu 0006. 1-7 [doi]
- GenFlow: Interactive Modular System for Image GenerationDuc Hung Nguyen, Huu-Phuc Huynh, Minh-Triet Tran, Trung-Nghia Le. 1-7 [doi]
- Breaking the 2D Dependency: What Limits 3D-Only Open-Vocabulary Scene UnderstandingDomenico D'Orsi, Fabio Carrara, Fabrizio Falchi, Nicola Tonellotto. 1-5 [doi]
- Facilitating Interactive Image Labelling Using Fine-Tuned SAM2Hermann Fürntratt, Werner Bailer. 1-5 [doi]
- MMMS: Multi-Modal Multi-Surface Interactive SegmentationRobin Schön, Julian Lorenz, Katja Ludwig, Daniel Kienle, Rainer Lienhart. 1-8 [doi]
- TrueEar: A Lightweight and Accurate Fake Voice Detector for Mobile DevicesCameron Baird, Ke Li, Dan Lin 0001. 1-8 [doi]
- VoiceVision: AI-Powered Speaker-Aware Cropping and Content Indexing for Multi-Speaker VideosMehdi Houshmand Sarkhoosh, Cise Midoglu, Saeed Shafiee Sabet, Tomas Kupka, Pål Halvorsen. 1-5 [doi]
- DSI-3D: Differentiable Search Index for Point Clouds RetrievalChahine-Nicolas Zede, Laurent Caraffa, Valérie Gouet-Brunet. 1-7 [doi]
- ReViewQwen: An Explainable Vision-Language Model for Discrepancy Detection in Multimodal E-Commerce ReviewsSandeep Kalari, Mohan Sunkara, Dominik Soós, Vikas Ashok, Ravi Mukkamala. 1-7 [doi]
- U-Cker: Initial Development of an Interactive Video Retrieval System for Novice UsersKazuya Ueki, Ryo Muto, Takuya Wada, Ryota Akaba, Genesis Faith Fernandez. 1-5 [doi]
- TSalV360: A Method and Dataset for Text-driven Saliency Detection in 360-Degrees VideosIoannis Kontostathis, Evlampios Apostolidis, Vasileios Mezaris. 1-8 [doi]
- Accelerating Vector Search at Scale: BAM-ANN with Batch-Aware Memory-Disk Hybrid IndexingM. M. Mahabubur Rahman, Jelena Tesic. 1-7 [doi]
- Melanoma Segmentation with SAM-Like Models: Assessing the Influence and Limits of Bounding Box InputNicolas Martin, Philippe Mulhem, Jean-Pierre Chevallet. 1-7 [doi]
- AgriPotential: A Novel Multi-Spectral and Multi-Temporal Remote Sensing Dataset for Agricultural PotentialsMohammad El Sakka, Caroline De Pourtales, Lotfi Chaâri, Josiane Mothe. 1-6 [doi]
- MI-Cap: A Multi-Modal Interpretable Model for Video CaptioningAntoine Hanna-Asaad, Decky Aspandi-Latif, Titus Zaharia. 1-8 [doi]
- Toward Content-Based Indexing and Retrieval of Head and Neck CT With Abscess SegmentationThao Thi Phuong Dao, Tan-Cong Nguyen, Trong-Le Do, Truong Hoang Viet, Nguyen Chi Thanh, Huynh Nguyen Thuan, Do Vo Cong Nguyen, Minh-Khoi Pham, Mai-Khiem Tran, Viet-Tham Huynh, Trong Thuan Nguyen, Trung-Nghia Le, Thanh Nhan Vo, Tam V. Nguyen 0002, Minh-Triet Tran, Thanh Dinh Le. 1-8 [doi]
- Enhancing Vision-Language Model Pre-Training with Image-Text Pair Pruning Based on Word FrequencyMingliang Liang, Martha A. Larson. 1-7 [doi]
- SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game UnderstandingSushant Gautam, Cise Midoglu, Vajira Lasantha Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah. 1-8 [doi]
- Semi-Supervised Approach to Detect Human Discontent from Real-Life Behaviour DataElena Vildjiounaite, Oulu Finland, Vesa Kyllönen, Johanna Kallio, Pauli Räsänen. 1-7 [doi]
- Media Search: A Multi-Stage Image Retrieval Framework with Enriched Image CaptioningAyse Vildan Nurdag, Mete Mert Birdal, Yusuf Yazici, Baris Özcan, Erkut Arican. 1-6 [doi]
- Historical Postcard Stamp Content UnderstandingMatthieu Pelingre, Salvatore Tabbone. 1-7 [doi]
- Understanding Indoor Context in an Office Environment: An Empirical Study on Air Stuffiness PerceptionJohanna Kallio, Jussi Liikka, Satu-Marja Mäkelä, Atte Kinnula, Elena Vildjiounaite. 1-6 [doi]
- Toward an Energy-Efficient and Explainable Neural Network Architecture for Detection of Breast Cancer in MammographyAlireza Siyavashi, Christian Herglotz. 1-7 [doi]
- Examining Performance Disparities Between Expert and Novice Users in Interactive Video RetrievalOmar Shahbaz Khan, Ujjwal Sharma 0001, Stevan Rudinac, Björn Þór Jónsson 0001. 1-4 [doi]
- Exploring the Effect of Size, Architecture and Fine-Tuning Hyperparameters on Large Visual-Language Model Adaptation for Video Memorability PredictionDavid Luna-García, Iván Martín-Fernández, Sergio Esteban Romero, Manuel Gil-Martín, Fernando Fernández-Martínez. 1-7 [doi]
- A Mixed-Methods Investigation of XR Security Warnings - Lessons LearnedJunyi Zou, Riccardo Bovo, Ali Hamza, Georgios Loukas. 1-8 [doi]
- Novice-Friendly Video Retrieval in Mixed Reality with Vitrivr- VRFlorian Spiess 0001, Heiko Schuldt. 1-3 [doi]
- First-Person Human Sensing for Upper Limb Neuroprosthesis Control: 6D Pose Estimation of Objects to GraspAnder Etxezarreta, Jenny Benois-Pineau, Renaud Péteri, Lucas Bardisbanian, Aymar de Rugy. 1-6 [doi]
- Probabilistic Fusion Model for Multi-Label Media Content ClassificationJavier Carreno, Khuong An Nguyen, Zhiyuan Luo 0001, Andrew Fish. 1-7 [doi]
- Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender ClassificationNathanya Queby Satriani, Djordje Slijepcevic, Markus Schedl, Matthias Zeppelzauer. 1-8 [doi]