Mixhead: Breaking the low-rank bottleneck in multi-head attention language models

Zhong Zhang 0004, Nian Shao, Chongming Gao, Rui Miao, Qinli Yang, Junming Shao. Mixhead: Breaking the low-rank bottleneck in multi-head attention language models. Knowl.-Based Syst., 240:108075, 2022. [doi]

Abstract

Abstract is missing.