To what extent do human explanations of model behavior align with actual model behavior?

Grusha Prasad, Yixin Nie, Mohit Bansal, Robin Jia, Douwe Kiela, Adina Williams. To what extent do human explanations of model behavior align with actual model behavior?. In Jasmijn Bastings, Yonatan Belinkov, Emmanuel Dupoux, Mario Giulianelli, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad, editors, Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2021, Punta Cana, Dominican Republic, November 11, 2021. pages 1-14, Association for Computational Linguistics, 2021. [doi]

Abstract

Abstract is missing.