Layer-wise enhanced transformer with multi-modal fusion for image caption

Jingdan Li, Yi Wang, Dexin Zhao. Layer-wise enhanced transformer with multi-modal fusion for image caption. Multimedia Syst., 29(3):1043-1056, June 2023. [doi]

Abstract

Abstract is missing.