FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui 0001. FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement. Proc. ACM Manag. Data, 1(1), 2023. [doi]

Authors

Xiaonan Nie

This author has not been identified. Look up 'Xiaonan Nie' in Google

Xupeng Miao

This author has not been identified. Look up 'Xupeng Miao' in Google

Zilong Wang

This author has not been identified. Look up 'Zilong Wang' in Google

Zichao Yang

This author has not been identified. Look up 'Zichao Yang' in Google

Jilong Xue

This author has not been identified. Look up 'Jilong Xue' in Google

Lingxiao Ma

This author has not been identified. Look up 'Lingxiao Ma' in Google

Gang Cao

This author has not been identified. Look up 'Gang Cao' in Google

Bin Cui 0001

This author has not been identified. Look up 'Bin Cui 0001' in Google