A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training - researchr publication

researchr

You are not signed in
Sign in
Sign up

Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele. A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. In Kyle A. Gallivan, Efstratios Gallopoulos, Dimitrios S. Nikolopoulos, Ramón Beivide, editors, Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023. pages 203-214, ACM, 2023. [doi]

Abstract is missing.

runs on WebDSL