ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning

Diandian Gu, Yihao Zhao, Yinmin Zhong, Yifan Xiong, Zhenhua Han, Peng Cheng, Fan Yang, Gang Huang, Xin Jin, Xuanzhe Liu. ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning. In Tor M. Aamodt, Natalie D. Enright Jerger, Michael M. Swift, editors, Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, Vancouver, BC, Canada, March 25-29, 2023. pages 266-280, ACM, 2023. [doi]

Abstract

Abstract is missing.