Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda 0001. Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. In IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024. pages 915-925, IEEE, 2024. [doi]
Abstract is missing.