Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen 0006, Yongjian Wu, Xiaowen Chu 0001. Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters. In Alex Smola, Alex Dimakis, Ion Stoica, editors, Proceedings of Machine Learning and Systems 2021, MLSys 2021, virtual, April 5-9, 2021. mlsys.org, 2021. [doi]

@inproceedings{ShiZSWZHJZGXLOZ21,
  title = {Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters},
  author = {Shaohuai Shi and Xianhao Zhou and Shutao Song and Xingyao Wang and Zilin Zhu and Xue Huang and Xinan Jiang and Feihu Zhou and Zhenyu Guo and Liqiang Xie and Rui Lan and Xianbin Ouyang and Yan Zhang and Jieqian Wei and Jing Gong and Weiliang Lin and Ping Gao and Peng Meng and Xiaomin Xu and Chenyang Guo and Bo Yang and Zhibo Chen 0006 and Yongjian Wu and Xiaowen Chu 0001},
  year = {2021},
  url = {https://proceedings.mlsys.org/paper/2021/hash/8613985ec49eb8f757ae6439e879bb2a-Abstract.html},
  researchr = {https://researchr.org/publication/ShiZSWZHJZGXLOZ21},
  cites = {0},
  citedby = {0},
  booktitle = {Proceedings of Machine Learning and Systems 2021, MLSys 2021, virtual, April 5-9, 2021},
  editor = {Alex Smola and Alex Dimakis and Ion Stoica},
  publisher = {mlsys.org},
}