AAN+: Generalized Average Attention Network for Accelerating Neural Transformer

Biao Zhang 0002, Deyi Xiong, Yubin Ge, Junfeng Yao, Hao Yue, Jinsong Su. AAN+: Generalized Average Attention Network for Accelerating Neural Transformer. J. Artif. Intell. Res. (JAIR), 75:677-708, 2022. [doi]

Abstract

Abstract is missing.