Multimedia analysis of robustly optimized multimodal transformer based on vision and language co-learning

Junho Yoon, Gyu Ho Choi, Chang Choi. Multimedia analysis of robustly optimized multimodal transformer based on vision and language co-learning. Information Fusion, 100:101922, December 2023. [doi]

Abstract

Abstract is missing.