Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping

Chenyu Jiang, Ye Tian, Zhen Jia 0001, Shuai Zheng 0004, Chuan Wu 0001, Yida Wang 0003. Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping. In Phillip B. Gibbons, Gennady Pekhimenko, Christopher De Sa, editors, Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys 2024, Santa Clara, CA, USA, May 13-16, 2024. mlsys.org, 2024. [doi]

Authors

Chenyu Jiang

This author has not been identified. Look up 'Chenyu Jiang' in Google

Ye Tian

This author has not been identified. Look up 'Ye Tian' in Google

Zhen Jia 0001

This author has not been identified. Look up 'Zhen Jia 0001' in Google

Shuai Zheng 0004

This author has not been identified. Look up 'Shuai Zheng 0004' in Google

Chuan Wu 0001

This author has not been identified. Look up 'Chuan Wu 0001' in Google

Yida Wang 0003

This author has not been identified. Look up 'Yida Wang 0003' in Google