BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance

R. Thomas McCoy, Junghyun Min, Tal Linzen. BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance. In Afra Alishahi, Yonatan Belinkov, Grzegorz Chrupala, Dieuwke Hupkes, Yuval Pinter, Hassan Sajjad, editors, Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2020, Online, November 2020. pages 217-227, Association for Computational Linguistics, 2020. [doi]

Abstract

Abstract is missing.