The following publications are possibly variants of this publication:
- EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language ModelsYuanteng Chen, Yuantian Shao, Peisong Wang, Jian Cheng 0001. acl 2025: 12942-12963 [doi]
- DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI ScaleSamyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He. icml 2022: 18332-18346 [doi]
- Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of ExpertsXiaoming Shi, Shiyu Wang, Yuqi Nie, Dianqi Li, Zhou Ye, Qingsong Wen, Ming Jin. iclr 2025: [doi]
- Uni-MoE: Scaling Unified Multimodal LLMs With Mixture of ExpertsYunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma 0002, Min Zhang 0005. pami, 47(5):3424-3439, May 2025. [doi]
- Efficient Large Scale Language Modeling with Mixtures of ExpertsMikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giridharan Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeffrey Wang, Luke Zettlemoyer, Mona T. Diab, Zornitsa Kozareva, Veselin Stoyanov. emnlp 2022: 11699-11732 [doi]
- SiDA: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts ModelsZhixu Du, Shiyu Li, Yuhao Wu, Xiangyu Jiang, Jingwei Sun 0002, Qilin Zheng, Yongkai Wu, Ang Li 0005, Hai Li 0001, Yiran Chen 0001. mlsys 2024: [doi]
- Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE ModelsWei Wang, Zhiquan Lai, Shengwei Li, Weijie Liu, Keshi Ge, Yujie Liu, Ao Shen, Dongsheng Li 0001. cluster 2023: 82-94 [doi]