The following publications are possibly variants of this publication:
- dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN TrainingHanpeng Hu, Chenyu Jiang, Yuchen Zhong, Yanghua Peng, Chuan Wu 0001, Yibo Zhu, Haibin Lin, Chuanxiong Guo. mlsys 2022: [doi]
- Expediting Distributed DNN Training With Device Topology-Aware Graph DeploymentShiwei Zhang, Xiaodong Yi 0001, Lansong Diao, Chuan Wu 0001, Siyu Wang, Wei Lin 0016. tpds, 34(4):1281-1293, April 2023. [doi]
- Elastic Averaging for Efficient Pipelined DNN TrainingZihao Chen, Chen Xu 0001, Weining Qian, Aoying Zhou. ppopp 2023: 380-391 [doi]
- FreezePipe: An Efficient Dynamic Pipeline Parallel Approach Based on Freezing Mechanism for Distributed DNN TrainingCaishan Weng, Zhiyang Shu, Zhengjia Xu, Jinghui Zhang, Junzhou Luo, Fang Dong 0001, Peng Wang, Zhengang Wang. cscwd 2023: 303-308 [doi]
- Optimizing Resource Allocation in Pipeline Parallelism for Distributed DNN TrainingYubin Duan, Jie Wu 0001. icpads 2023: 161-168 [doi]
- SmartPipe: Intelligently Freezing Layers in Pipeline Parallelism for Distributed DNN TrainingNadia Niknami, Abdalaziz Sawwan, Jie Wu 0001. icpads 2023: 1885-1894 [doi]
- PipePar: A Pipelined Hybrid Parallel Approach for Accelerating Distributed DNN TrainingJiange Li, Yuchen Wang, Jinghui Zhang, Jiahui Jin, Fang Dong 0001, Lei Qian. cscwd 2021: 470-475 [doi]
- Swift: Expedited Failure Recovery for Large-Scale DNN TrainingYuchen Zhong, Guangming Sheng, Juncheng Liu, Jinhui Yuan, Chuan Wu 0001. ppopp 2023: 447-449 [doi]
- Efficient All-Reduce for Distributed DNN Training in Optical Interconnect SystemsFei Dai, Yawen Chen 0001, Zhiyi Huang 0001, Haibo Zhang 0001, Fangfang Zhang 0002. ppopp 2023: 422-424 [doi]