Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models

Hung-Shuo Chang, Chien-Yao Wang, Richard Robert Wang, Gene Chou, Hong-Yuan Mark Liao. Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025, Tucson, AZ, USA, February 26 - March 6, 2025. pages 6217-6227, IEEE, 2025. [doi]

Abstract

Abstract is missing.