Byoungjip Kim, Dasol Hwang, Sungjun Cho, Youngsoo Jang, Honglak Lee, Moontae Lee. Show, Think, and Tell: Thought-Augmented Fine-Tuning of Large Language Models for Video Captioning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Workshops, Seattle, WA, USA, June 17-18, 2024. pages 1808-1817, IEEE, 2024. [doi]
Abstract is missing.