MoEfication: Transformer Feed-forward Layers are Mixtures of Experts

Zhengyan Zhang, Yankai Lin, Zhiyuan Liu 0001, Peng Li 0030, Maosong Sun, Jie Zhou 0016. MoEfication: Transformer Feed-forward Layers are Mixtures of Experts. In Smaranda Muresan, Preslav Nakov, Aline Villavicencio, editors, Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022. pages 877-890, Association for Computational Linguistics, 2022. [doi]

Abstract

Abstract is missing.