ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Xiao Xu 0005, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan. ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning. In Anna Rogers, Jordan L. Boyd-Graber, Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023. pages 14507-14525, Association for Computational Linguistics, 2023. [doi]

Authors

Xiao Xu 0005

This author has not been identified. Look up 'Xiao Xu 0005' in Google

Bei Li

This author has not been identified. Look up 'Bei Li' in Google

Chenfei Wu

This author has not been identified. Look up 'Chenfei Wu' in Google

Shao-Yen Tseng

This author has not been identified. Look up 'Shao-Yen Tseng' in Google

Anahita Bhiwandiwalla

This author has not been identified. Look up 'Anahita Bhiwandiwalla' in Google

Shachar Rosenman

This author has not been identified. Look up 'Shachar Rosenman' in Google

Vasudev Lal

This author has not been identified. Look up 'Vasudev Lal' in Google

Wanxiang Che

This author has not been identified. Look up 'Wanxiang Che' in Google

Nan Duan

This author has not been identified. Look up 'Nan Duan' in Google