The following publications are possibly variants of this publication:
- DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI ScaleSamyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He. icml 2022: 18332-18346 [doi]
- Efficient Large Scale Language Modeling with Mixtures of ExpertsMikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giridharan Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeffrey Wang, Luke Zettlemoyer, Mona T. Diab, Zornitsa Kozareva, Veselin Stoyanov. emnlp 2022: 11699-11732 [doi]
- Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE ModelsWei Wang, Zhiquan Lai, Shengwei Li, Weijie Liu, Keshi Ge, Yujie Liu, Ao Shen, Dongsheng Li 0001. cluster 2023: 82-94 [doi]
- SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of ExpertsZhao You, Shulin Feng, Dan Su, Dong Yu. interspeech 2021: 2077-2081 [doi]
- Deep Mixture of Diverse Experts for Large-Scale Visual RecognitionTianyi Zhao, Qiuyu Chen, Zhenzhong Kuang, Jun Yu 0002, Wei Zhang, Jianping Fan 0001. pami, 41(5):1072-1087, 2019. [doi]