Transform, contrast and tell: Coherent entity-aware multi-image captioning

Jingqiang Chen. Transform, contrast and tell: Coherent entity-aware multi-image captioning. Computer Vision and Image Understanding, 238:103878, January 2024. [doi]

Abstract

Abstract is missing.