FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui 0001. FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement. Proc. ACM Manag. Data, 1(1), 2023. [doi]

@article{NieMWYXMC023,
  title = {FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement},
  author = {Xiaonan Nie and Xupeng Miao and Zilong Wang and Zichao Yang and Jilong Xue and Lingxiao Ma and Gang Cao and Bin Cui 0001},
  year = {2023},
  doi = {10.1145/3588964},
  url = {https://doi.org/10.1145/3588964},
  researchr = {https://researchr.org/publication/NieMWYXMC023},
  cites = {0},
  citedby = {0},
  journal = {Proc. ACM Manag. Data},
  volume = {1},
  number = {1},
}