TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training

Chang Chen, Min Li, Zhihua Wu, Dianhai Yu, Chao Yang. TA-MoE: Topology-Aware Large Scale Mixture-of-Expert Training. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

Authors

Chang Chen

This author has not been identified. Look up 'Chang Chen' in Google

Min Li

This author has not been identified. Look up 'Min Li' in Google

Zhihua Wu

This author has not been identified. Look up 'Zhihua Wu' in Google

Dianhai Yu

This author has not been identified. Look up 'Dianhai Yu' in Google

Chao Yang

This author has not been identified. Look up 'Chao Yang' in Google