Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Hang Zhang, Xin Li, Lidong Bing. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. In Yansong Feng, Els Lefever, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - System Demonstrations, Singapore, December 6-10, 2023. pages 543-553, Association for Computational Linguistics, 2023. [doi]

This author has not been identified. Look up 'Hang Zhang' in GoogleThis author has not been identified. Look up 'Xin Li' in GoogleThis author has not been identified. Look up 'Lidong Bing' in Google

runs on WebDSL