Abstract is missing.
- Removing Stray-Light for Wild-Field Fundus Image Fusion Based on Large Generative ModelsJun Wu 0022, Mingxin He, Yang Liu, Jingjie Lin, Zeyu Huang, Dayong Ding. 3-16 [doi]
- Training-Free Region Prediction with Stable DiffusionYuma Honbu, Keiji Yanai. 17-31 [doi]
- Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption RewritesLei Wang 0185, Jiabang He, Shenshen Li, Ning Liu, Ee-Peng Lim. 32-45 [doi]
- GDTNet: A Synergistic Dilated Transformer and CNN by Gate Attention for Abdominal Multi-organ SegmentationCan Zhang, Zhiqiang Wang, Yuan Zhang, Xuanya Li, Kai Hu 0002. 46-57 [doi]
- Fine-Grained Multi-modal Fundus Image Generation Based on Diffusion Models for Glaucoma ClassificationXinyue Liu, Gang Yang 0001, Yang Zhou, Yajie Yang, Weichen Huang, Dayong Ding, Jun Wu. 58-70 [doi]
- Adapting Pretrained Large-Scale Vision Models for Face Forgery DetectionLantao Wang, Chao Ma. 71-85 [doi]
- Towards Cross-Modal Point Cloud Retrieval for Indoor ScenesFuyang Yu, Zhen Wang, Dongyuan Li, Peide Zhu, Xiaohui Liang, Xiaochuan Wang, Manabu Okumura. 89-102 [doi]
- Correlation Visualization Under Missing Values: A Comparison Between Imputation and Direct Parameter Estimation MethodsNhat-Hao Pham, Khanh-Linh Vo, Mai-Anh Vu, Thu Nguyen, Michael A. Riegler, Pål Halvorsen, Binh T. Nguyen 0001. 103-116 [doi]
- IFI: Interpreting for Improving: A Multimodal Transformer with an Interpretability Technique for Recognition of Risk EventsRupayan Mallick, Jenny Benois-Pineau, Akka Zemmari. 117-131 [doi]
- Ookpik- A Collection of Out-of-Context Image-Caption PairsKha-Luan Pham, Minh-Khoi Nguyen-Nhat, Anh-Huy Dinh, Quang-Tri Le, Manh-Thien Nguyen, Anh Duy Tran, Minh-Triet Tran, Duc-Tien Dang-Nguyen. 132-144 [doi]
- LUMOS-DM: Landscape-Based Multimodal Scene Retrieval Enhanced by Diffusion ModelViet-Tham Huynh, Trong Thuan Nguyen, Quang-Thuc Nguyen, Mai-Khiem Tran, Tam V. Nguyen 0002, Minh-Triet Tran. 145-158 [doi]
- Mining Landmark Images for Scene Reconstruction from Weakly Annotated Video CollectionsHelmut Neuschmied, Werner Bailer. 161-174 [doi]
- A Framework for 3D Modeling of Construction Sites Using Aerial Imagery and Semantic NeRFsPanagiotis Vrachnos, Marios Krestenitis, Ilias Koulalis, Konstantinos Ioannidis, Stefanos Vrochidis. 175-187 [doi]
- Multimodal 3D Object RetrievalMaria Pegia, Björn Þór Jónsson 0001, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis, Ioannis Kompatsiaris. 188-201 [doi]
- An Integrated System for Spatio-temporal Summarization of 360-Degrees VideosIoannis Kontostathis, Evlampios Apostolidis, Vasileios Mezaris. 202-215 [doi]
- Mutant Texts: A Technique for Uncovering Unexpected Inconsistencies in Large-Scale Vision-Language ModelsMingliang Liang, Zhouran Liu, Martha A. Larson. 219-233 [doi]
- Exploring Artificial Intelligence for Advancing Performance Processes and Events in Io3MTRômulo Vieira, Débora C. Muchaluat-Saade, Pablo César. 234-248 [doi]
- Implementation of Melody Slot MachinesMasatoshi Hamanaka. 251-257 [doi]
- E2Evideo: End to End Video and Image Pre-processing and Analysis ToolFaiga Alawad, Pål Halvorsen, Michael A. Riegler. 258-264 [doi]
- Augmented Reality Photo Presentation and Content-Based Image Retrieval on Mobile Devices with AR-ExplorerLoris Sauter, Tim Bachmann, Heiko Schuldt, Luca Rossetto. 265-270 [doi]
- Facilitating the Production of Well-Tailored Video Summaries for Sharing on Social MediaEvlampios Apostolidis, Konstantinos Apostolidis, Vasileios Mezaris. 271-278 [doi]
- AI-Based Cropping of Soccer Videos for Different Social Media RepresentationsMehdi Houshmand Sarkhoosh, Sayed Mohammad Majidi Dorcheh, Cise Midoglu, Saeed Shafiee Sabet, Tomas Kupka, Dag Johansen, Michael A. Riegler, Pål Halvorsen. 279-287 [doi]
- Few-Shot Object Detection as a Service: Facilitating Training and Deployment for Domain ExpertsWerner Bailer, Mihai Dogariu, Bogdan Ionescu, Hannes Fassold. 288-294 [doi]
- DatAR: Supporting Neuroscience Literature Exploration by Finding Relations Between Topics in Augmented RealityBoyu Xu, Ghazaleh Tanhaei, Lynda Hardman, Wolfgang Hürst. 295-300 [doi]
- EmoAda: A Multimodal Emotion Interaction and Psychological Adaptation SystemTengteng Dong, Fangyuan Liu, Xinke Wang, Yishun Jiang, Xiwei Zhang, Xiao Sun 0003. 301-307 [doi]
- Waseda_Meisei_SoftBank at Video Browser Showdown 2024Takayuki Hori, Kazuya Ueki, Yuma Suzuki, Hiroki Takushima, Hayato Tanoue, Haruki Sato, Takumi Takada, Aiswariya Manoj Kumar. 311-316 [doi]
- Exploring Multimedia Vector Spaces with vitrivr-VRFlorian Spiess, Luca Rossetto, Heiko Schuldt. 317-323 [doi]
- A New Retrieval Engine for VitrivrRalph Gasser, Rahel Arnold, Fynn Faber, Heiko Schuldt, Raphael Waltenspül, Luca Rossetto. 324-331 [doi]
- VISIONE 5.0: Enhanced User Interface and AI Models for VBS2024Giuseppe Amato 0001, Paolo Bolettieri, Fabio Carrara, Fabrizio Falchi, Claudio Gennaro, Nicola Messina, Lucia Vadicamo, Claudio Vairo. 332-339 [doi]
- PraK Tool: An Interactive Search Tool Based on Video Data ServicesJakub Lokoc, Zuzana Vopálková, Michael Stroh, Raphael Buchmueller, Udo Schlegel. 340-346 [doi]
- Exquisitor at the Video Browser Showdown 2024: Relevance Feedback Meets Conversational SearchOmar Shahbaz Khan, Hongyi Zhu, Ujjwal Sharma, Evangelos Kanoulas, Stevan Rudinac, Björn Þór Jónsson 0001. 347-355 [doi]
- VERGE in VBS 2024Nick Pantelidis, Maria Pegia, Damianos Galanopoulos, Konstantinos Apostolidis, Klearchos Stavrothanasopoulos, Anastasia Moumtzidou, Konstantinos Gkountakos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, Ioannis Kompatsiaris, Björn Þór Jónsson 0001. 356-363 [doi]
- Optimizing the Interactive Video Retrieval Tool Vibro for the Video Browser Showdown 2024Konstantin Schall, Nico Hezel, Kai-Uwe Barthel, Klaus Jung. 364-371 [doi]
- DiveXplore at the Video Browser Showdown 2024Klaus Schoeffmann, Sahar Nasirihaghighi. 372-379 [doi]
- Leveraging LLMs and Generative Models for Interactive Known-Item Video SearchZhixin Ma, Jiaxin Wu, Chong-Wah Ngo. 380-386 [doi]
- TalkSee: Interactive Video Retrieval Engine Using Large Language ModelGuihe Gu, Zhengqian Wu, Jiangshan He, Lin Song, Zhongyuan Wang 0001, Chao Liang. 387-393 [doi]
- VideoCLIP 2.0: An Interactive CLIP-Based Video Retrieval System for Novice Users at VBS2024Thao-Nhu Nguyen, Le Minh Quang, Graham Healy, Binh T. Nguyen 0001, Cathal Gurrin. 394-399 [doi]
- ViewsInsight: Enhancing Video Retrieval for VBS 2024 with a User-Friendly Interaction MechanismGia-Huy Vuong, Van-Son Ho, Tien-Thanh Nguyen-Dang, Xuan-Dang Thai, Tu-Khiem Le, Minh-Khoi Pham, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran. 400-406 [doi]