NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference - researchr publication

researchr

You are not signed in
Sign in
Sign up

Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu. NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference. In Matei Zaharia, Gauri Joshi, Yingyan (Celine) Lin, editors, Proceedings of the Eighth Conference on Machine Learning and Systems, MLSys 2025, Santa Clara, CA, USA, May 12-15, 2025. OpenReview.net/mlsys.org, 2025. [doi]

Abstract is missing.

runs on WebDSL