SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization

Mingshu Zhai, Jiaao He, Zixuan Ma, Zan Zong, Runqing Zhang, Jidong Zhai. SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization. In Julia Lawall, Dan Williams, editors, 2023 USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10-12, 2023. pages 961-975, USENIX Association, 2023. [doi]

Authors

Mingshu Zhai

This author has not been identified. Look up 'Mingshu Zhai' in Google

Jiaao He

This author has not been identified. Look up 'Jiaao He' in Google

Zixuan Ma

This author has not been identified. Look up 'Zixuan Ma' in Google

Zan Zong

This author has not been identified. Look up 'Zan Zong' in Google

Runqing Zhang

This author has not been identified. Look up 'Runqing Zhang' in Google

Jidong Zhai

This author has not been identified. Look up 'Jidong Zhai' in Google