Improving Transformer with an Admixture of Attention Heads

Tan Nguyen, Tam Nguyen, Hai Do, Khai Nguyen, Vishwanath Saragadam, Minh Pham 0003, Duy Khuong Nguyen, Nhat Ho, Stanley J. Osher. Improving Transformer with an Admixture of Attention Heads. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

Abstract

Abstract is missing.