VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai. VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023. [doi]

Authors

Wenhai Wang

This author has not been identified. Look up 'Wenhai Wang' in Google

Zhe Chen

This author has not been identified. Look up 'Zhe Chen' in Google

Xiaokang Chen

This author has not been identified. Look up 'Xiaokang Chen' in Google

Jiannan Wu

This author has not been identified. Look up 'Jiannan Wu' in Google

Xizhou Zhu

This author has not been identified. Look up 'Xizhou Zhu' in Google

Gang Zeng

This author has not been identified. Look up 'Gang Zeng' in Google

Ping Luo

This author has not been identified. Look up 'Ping Luo' in Google

Tong Lu

This author has not been identified. Look up 'Tong Lu' in Google

Jie Zhou

This author has not been identified. Look up 'Jie Zhou' in Google

Yu Qiao

This author has not been identified. Look up 'Yu Qiao' in Google

Jifeng Dai

This author has not been identified. Look up 'Jifeng Dai' in Google