SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization

Mingshu Zhai, Jiaao He, Zixuan Ma, Zan Zong, Runqing Zhang, Jidong Zhai. SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization. In Julia Lawall, Dan Williams, editors, 2023 USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10-12, 2023. pages 961-975, USENIX Association, 2023. [doi]

Abstract

Abstract is missing.