CLIPPO: Image-and-Language Understanding from Pixels Only

Michael Tschannen, Basil Mustafa, Neil Houlsby. CLIPPO: Image-and-Language Understanding from Pixels Only. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. pages 11006-11017, IEEE, 2023. [doi]

Abstract

Abstract is missing.