Abstract is missing.
- LocoMotion: Learning Motion-Focused Video-Language RepresentationsHazel Doughty, Fida Mohammad Thoker, Cees G. M. Snoek. 3-24 [doi]
- Beyond Coarse-Grained Matching in Video-Text RetrievalAozhu Chen, Hazel Doughty, Xirong Li 0001, Cees G. M. Snoek. 25-43 [doi]
- TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation ModelsRabin Adhikari, Safal Thapaliya, Manish Dhakal, Bishesh Khanal. 44-62 [doi]
- Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character NamesRagav Sachdeva, Gyungin Shin, Andrew Zisserman. 63-80 [doi]
- AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio DescriptionJunyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman. 81-97 [doi]
- MedBLIP: Bootstrapping Language-Image Pretraining from 3D Medical Images and TextsQiuhui Chen, Yi Hong. 98-113 [doi]
- OneDiff: A Generalist Model for Image Difference CaptioningErdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu 0001. 114-130 [doi]
- Enhancing Anchor-Based Weakly Supervised Referring Expression Comprehension with Cross-Modality AttentionTing-Yu Chu, Yong-Xiang Lin, Ching-Chun Huang, Kai-Lung Hua. 131-147 [doi]
- Diffusion-Based Multimodal Video CaptioningJaakko Kainulainen, Zixin Guo, Jorma Laaksonen. 148-165 [doi]
- Deneb: A Hallucination-Robust Automatic Evaluation Metric for Image CaptioningKazuki Matsuda, Yuiga Wada, Komei Sugiura. 166-182 [doi]
- M-RAT: a Multi-grained Retrieval Augmentation Transformer for Image CaptioningJiayan Song, Renjie Pan, Jun Zhou, Hua Yang. 185-203 [doi]
- Fine-Tuning Large Language Models for Automatic Font Skeleton Generation: Exploration and AnalysisYuxuan Liu, Yasuhisa Fujii, Xinru Zhu, Kayoko Nohara. 204-219 [doi]
- Capture Concept Through Comparison: Vision-and-Language Representation Learning with Intrinsic Information MiningYun-Zhu Song, Yi-Syuan Chen, Tzu-Ling Lin, Bei Liu 0001, Jianlong Fu, Hong-Han Shuai. 220-238 [doi]
- Do They Share the Same Tail? Learning Individual Compositional Attribute Prototype for Generalized Zero-Shot LearningYuyan Shi, Chenyi Jiang, Run Shi, Haofeng Zhang. 239-256 [doi]
- BiEfficient: Bidirectionally Prompting Vision-Language Models for Parameter-Efficient Video RecognitionHaichen He, Weibin Liu, Weiwei Xing. 257-274 [doi]
- It's Just Another Day: Unique Video Captioning by Discriminitive PromptingToby Perrett, Tengda Han, Dima Damen, Andrew Zisserman. 275-293 [doi]
- Parameter-Efficient Instance-Adaptive Neural Video CompressionSeungjun Oh, Hyunmo Yang, Eunbyung Park. 294-311 [doi]
- VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly DetectionSunghyun Ahn, Youngwan Jo, Kijung Lee, Sanghyun Park 0003. 312-328 [doi]
- Scene-Adaptive SVAD Based On Multi-modal Action-Based Feature ExtractionShibo Gao, Peipei Yang, LinLin Huang. 329-346 [doi]
- 3D-Aware Instance Segmentation and Tracking in Egocentric VideosYash Bhalgat, Vadim Tschernezki, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman. 347-364 [doi]
- Character-Aware Audio-Visual Subtitling in ContextJaesung Huh, Andrew Zisserman. 365-383 [doi]
- Every Shot Counts: Using Exemplars for Repetition Counting in VideosSaptarshi Sinha, Alexandros Stergiou, Dima Damen. 384-402 [doi]
- Continual Learning Improves Zero-Shot Action RecognitionShreyank N. Gowda, Davide Moltisanti, Laura Sevilla-Lara. 403-421 [doi]
- TAPS: Temporal Attention-Based Pruning and Scaling for Efficient Video Action RecognitionYonatan Dinai, Avraham Raviv, Nimrod Harel, Donghoon Kim, Ishay Goldin, Niv Zehngut. 422-438 [doi]
- Text Query to Web Image to Video: A Comprehensive Ad-Hoc Video SearchNhat-Minh Nguyen, Tien-Dung Mai, Duy-Dinh Le. 439-453 [doi]
- Telling Stories for Common Sense Zero-Shot Action RecognitionShreyank N. Gowda, Laura Sevilla-Lara. 454-471 [doi]