MoEoM: Joint Compute and Memory-Aware Balancing for Fast MoE Inference

Ziqi Gong, Yitao Hu, Sheng Chen 0015, Wenxin Li 0001, Keqiu Li. MoEoM: Joint Compute and Memory-Aware Balancing for Fast MoE Inference. In 31th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2025, Hefei, China, December 14-18, 2025. pages 1-10, IEEE, 2025. [doi]

Abstract

Abstract is missing.