ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling

Jun Zhou, Ke Zhang, Feng Zhu, Qitao Shi, Wenjing Fang, Lin Wang, Yi Wang. ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling. In Tat-Seng Chua, Hady W. Lauw, Luo Si, Evimaria Terzi, Panayiotis Tsaparas, editors, Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023. pages 1148-1151, ACM, 2023. [doi]

@inproceedings{ZhouZZSFWW23,
  title = {ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling},
  author = {Jun Zhou and Ke Zhang and Feng Zhu and Qitao Shi and Wenjing Fang and Lin Wang and Yi Wang},
  year = {2023},
  doi = {10.1145/3539597.3573037},
  url = {https://doi.org/10.1145/3539597.3573037},
  researchr = {https://researchr.org/publication/ZhouZZSFWW23},
  cites = {0},
  citedby = {0},
  pages = {1148-1151},
  booktitle = {Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM 2023, Singapore, 27 February 2023 - 3 March 2023},
  editor = {Tat-Seng Chua and Hady W. Lauw and Luo Si and Evimaria Terzi and Panayiotis Tsaparas},
  publisher = {ACM},
  isbn = {978-1-4503-9407-9},
}