Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng 0007, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica. MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs. In Lieven Eeckhout, Georgios Smaragdakis, Kaitai Liang, Adrian Sampson, Martha A. Kim, Christopher J. Rossbach, editors, Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, ASPLOS 2025, Rotterdam, The Netherlands, 30 March 2025 - 3 April 2025. pages 715-730, ACM, 2025. [doi]
Abstract is missing.