Kale: Elastic GPU Scheduling for Online DL Model Training

Ziyang Liu, Renyu Yang, Jin Ouyang, Weihan Jiang, Tianyu Ye, Menghao Zhang 0001, Sui Huang, Jiaming Huang, Chengru Song, Di Zhang, Tianyu Wo, Chunming Hu. Kale: Elastic GPU Scheduling for Online DL Model Training. In Proceedings of the 2024 ACM Symposium on Cloud Computing, SoCC 2024, Redmond, WA, USA, November 20-22, 2024. pages 36-51, ACM, 2024. [doi]

Abstract

Abstract is missing.