Mixture of Attention Heads: Selecting Attention Heads Per Token

Xiaofeng Zhang, Yikang Shen, Zeyu Huang, Jie Zhou, Wenge Rong, Zhang Xiong 0001. Mixture of Attention Heads: Selecting Attention Heads Per Token. In Yoav Goldberg, Zornitsa Kozareva, Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11. pages 4150-4162, Association for Computational Linguistics, 2022. [doi]

Abstract

Abstract is missing.