DistSim: A performance model of large-scale hybrid distributed DNN training

Guandong Lu, Runzhe Chen, Yakai Wang, Yangjie Zhou 0001, Rui Zhang, Zheng Hu, Yanming Miao, Zhifang Cai, Li Li 0012, Jingwen Leng, Minyi Guo. DistSim: A performance model of large-scale hybrid distributed DNN training. In Andrea Bartolini, Kristian F. D. Rietveld, Catherine D. Schuman, Jose Moreira, editors, Proceedings of the 20th ACM International Conference on Computing Frontiers, CF 2023, Bologna, Italy, May 9-11, 2023. pages 112-122, ACM, 2023. [doi]

Abstract

Abstract is missing.