A Mixture of h - 1 Heads is Better than h Heads

Hao Peng, Roy Schwartz, Dianqi Li, Noah A. Smith. A Mixture of h - 1 Heads is Better than h Heads. In Dan Jurafsky, Joyce Chai, Natalie Schluter, Joel R. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. pages 6566-6577, Association for Computational Linguistics, 2020. [doi]

Abstract

Abstract is missing.