VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-modal Information Retrieval

Yan Gong, Georgina Cosma, Axel Finke. VITR: Augmenting Vision Transformers with Relation-Focused Learning for Cross-modal Information Retrieval. TKDD, 18(9), November 2024. [doi]

Abstract

Abstract is missing.