Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

Javier Ferrando, Marta R. Costa-Jussà. Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih, editors, Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021. pages 434-443, Association for Computational Linguistics, 2021. [doi]

Abstract

Abstract is missing.