Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping

Chenyu Jiang, Ye Tian, Zhen Jia 0001, Shuai Zheng 0004, Chuan Wu 0001, Yida Wang 0003. Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping. In Phillip B. Gibbons, Gennady Pekhimenko, Christopher De Sa, editors, Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys 2024, Santa Clara, CA, USA, May 13-16, 2024. mlsys.org, 2024. [doi]

Abstract

Abstract is missing.