ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference

Hyungjun Oh, Kihong Kim, Jaemin Kim, Sungkyun Kim, Junyeol Lee, Du-Seong Chang, Jiwon Seo 0002. ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference. In Rajiv Gupta 0001, Nael B. Abu-Ghazaleh, Madan Musuvathi, Dan Tsafrir, editors, Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2024, La Jolla, CA, USA, 27 April 2024- 1 May 2024. pages 369-384, ACM, 2024. [doi]

Abstract

Abstract is missing.