Haiyang Xu, Haoxiang Liu, Wei Gong 0001, Xianjun Deng, Hai Wang. Sparse Mixture of Experts Language Models Excel in Knowledge Distillation. In Derek F. Wong, Zhongyu Wei, Muyun Yang, editors, Natural Language Processing and Chinese Computing - 13th National CCF Conference, NLPCC 2024, Hangzhou, China, November 1-3, 2024, Proceedings, Part III. Volume 15361 of Lecture Notes in Computer Science, pages 80-91, Springer, 2024. [doi]
Abstract is missing.