Stacked cross-modal feature consolidation attention networks for image captioning

Mozhgan PourKeshavarz, Shahabedin Nabavi, Mohsen Ebrahimi Moghaddam, Mehrnoush Shamsfard. Stacked cross-modal feature consolidation attention networks for image captioning. Multimedia Tools Appl., 83(4):12209-12233, January 2024. [doi]

Abstract

Abstract is missing.