Fully-attentive iterative networks for region-based controllable image and video captioning

Marcella Cornia, Lorenzo Baraldi, Ayellet Tal, Rita Cucchiara. Fully-attentive iterative networks for region-based controllable image and video captioning. Computer Vision and Image Understanding, 237:103857, December 2023. [doi]

Abstract

Abstract is missing.