ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling

Jun Zhou, Ke Zhang, Feng Zhu, Qitao Shi, Wenjing Fang, Lin Wang, Yi Wang. ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling. In Tat-Seng Chua, Hady W. Lauw, Luo Si, Evimaria Terzi, Panayiotis Tsaparas, editors, Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023. pages 1148-1151, ACM, 2023. [doi]

Authors

Jun Zhou

This author has not been identified. Look up 'Jun Zhou' in Google

Ke Zhang

This author has not been identified. Look up 'Ke Zhang' in Google

Feng Zhu

This author has not been identified. Look up 'Feng Zhu' in Google

Qitao Shi

This author has not been identified. Look up 'Qitao Shi' in Google

Wenjing Fang

This author has not been identified. Look up 'Wenjing Fang' in Google

Lin Wang

This author has not been identified. Look up 'Lin Wang' in Google

Yi Wang

This author has not been identified. Look up 'Yi Wang' in Google