ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling

Jun Zhou, Ke Zhang, Feng Zhu, Qitao Shi, Wenjing Fang, Lin Wang, Yi Wang. ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling. In Tat-Seng Chua, Hady W. Lauw, Luo Si, Evimaria Terzi, Panayiotis Tsaparas, editors, Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023. pages 1148-1151, ACM, 2023. [doi]

Abstract

Abstract is missing.