Computer Vision - ACCV 2024 - 17th Asian Conference on Computer Vision, Hanoi, Vietnam, December 8-12, 2024, Proceedings, Part III - researchr publication

researchr

You are not signed in
Sign in
Sign up

Minsu Cho, Ivan Laptev, Du Tran, Angela Yao, Hongbin Zha, editors, Computer Vision - ACCV 2024 - 17th Asian Conference on Computer Vision, Hanoi, Vietnam, December 8-12, 2024, Proceedings, Part III. Volume 15474 of Lecture Notes in Computer Science, Springer, 2025. [doi]

Conference: ACCV2025

Abstract is missing.

LocoMotion: Learning Motion-Focused Video-Language RepresentationsHazel Doughty, Fida Mohammad Thoker, Cees G. M. Snoek. 3-24 [doi]

Beyond Coarse-Grained Matching in Video-Text RetrievalAozhu Chen, Hazel Doughty, Xirong Li 0001, Cees G. M. Snoek. 25-43 [doi]

TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation ModelsRabin Adhikari, Safal Thapaliya, Manish Dhakal, Bishesh Khanal. 44-62 [doi]

Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character NamesRagav Sachdeva, Gyungin Shin, Andrew Zisserman. 63-80 [doi]

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio DescriptionJunyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman. 81-97 [doi]

MedBLIP: Bootstrapping Language-Image Pretraining from 3D Medical Images and TextsQiuhui Chen, Yi Hong. 98-113 [doi]

OneDiff: A Generalist Model for Image Difference CaptioningErdong Hu, Longteng Guo, Tongtian Yue, Zijia Zhao, Shuning Xue, Jing Liu 0001. 114-130 [doi]

Enhancing Anchor-Based Weakly Supervised Referring Expression Comprehension with Cross-Modality AttentionTing-Yu Chu, Yong-Xiang Lin, Ching-Chun Huang, Kai-Lung Hua. 131-147 [doi]

Diffusion-Based Multimodal Video CaptioningJaakko Kainulainen, Zixin Guo, Jorma Laaksonen. 148-165 [doi]

Deneb: A Hallucination-Robust Automatic Evaluation Metric for Image CaptioningKazuki Matsuda, Yuiga Wada, Komei Sugiura. 166-182 [doi]

M-RAT: a Multi-grained Retrieval Augmentation Transformer for Image CaptioningJiayan Song, Renjie Pan, Jun Zhou, Hua Yang. 185-203 [doi]

Fine-Tuning Large Language Models for Automatic Font Skeleton Generation: Exploration and AnalysisYuxuan Liu, Yasuhisa Fujii, Xinru Zhu, Kayoko Nohara. 204-219 [doi]

Capture Concept Through Comparison: Vision-and-Language Representation Learning with Intrinsic Information MiningYun-Zhu Song, Yi-Syuan Chen, Tzu-Ling Lin, Bei Liu 0001, Jianlong Fu, Hong-Han Shuai. 220-238 [doi]

Do They Share the Same Tail? Learning Individual Compositional Attribute Prototype for Generalized Zero-Shot LearningYuyan Shi, Chenyi Jiang, Run Shi, Haofeng Zhang. 239-256 [doi]

BiEfficient: Bidirectionally Prompting Vision-Language Models for Parameter-Efficient Video RecognitionHaichen He, Weibin Liu, Weiwei Xing. 257-274 [doi]

It's Just Another Day: Unique Video Captioning by Discriminitive PromptingToby Perrett, Tengda Han, Dima Damen, Andrew Zisserman. 275-293 [doi]

Parameter-Efficient Instance-Adaptive Neural Video CompressionSeungjun Oh, Hyunmo Yang, Eunbyung Park. 294-311 [doi]

VideoPatchCore: An Effective Method to Memorize Normality for Video Anomaly DetectionSunghyun Ahn, Youngwan Jo, Kijung Lee, Sanghyun Park 0003. 312-328 [doi]

Scene-Adaptive SVAD Based On Multi-modal Action-Based Feature ExtractionShibo Gao, Peipei Yang, LinLin Huang. 329-346 [doi]

3D-Aware Instance Segmentation and Tracking in Egocentric VideosYash Bhalgat, Vadim Tschernezki, Iro Laina, João F. Henriques, Andrea Vedaldi, Andrew Zisserman. 347-364 [doi]

Character-Aware Audio-Visual Subtitling in ContextJaesung Huh, Andrew Zisserman. 365-383 [doi]

Every Shot Counts: Using Exemplars for Repetition Counting in VideosSaptarshi Sinha, Alexandros Stergiou, Dima Damen. 384-402 [doi]

Continual Learning Improves Zero-Shot Action RecognitionShreyank N. Gowda, Davide Moltisanti, Laura Sevilla-Lara. 403-421 [doi]

TAPS: Temporal Attention-Based Pruning and Scaling for Efficient Video Action RecognitionYonatan Dinai, Avraham Raviv, Nimrod Harel, Donghoon Kim, Ishay Goldin, Niv Zehngut. 422-438 [doi]

Text Query to Web Image to Video: A Comprehensive Ad-Hoc Video SearchNhat-Minh Nguyen, Tien-Dung Mai, Duy-Dinh Le. 439-453 [doi]

Telling Stories for Common Sense Zero-Shot Action RecognitionShreyank N. Gowda, Laura Sevilla-Lara. 454-471 [doi]

runs on WebDSL