The following publications are possibly variants of this publication:
- HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program SynthesisShiwei Zhang, Lansong Diao, Chuan Wu 0001, Zongyan Cao, Siyu Wang, Wei Lin 0016. eurosys 2024: 524-541 [doi]
- PipePar: Enabling fast DNN pipeline parallel training in heterogeneous GPU clustersJinghui Zhang, Geng Niu, Qiangsheng Dai, Haorui Li, Zhihua Wu, Fang Dong, Zhiang Wu 0001. ijon, 555:126661, October 2023. [doi]
- Near-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN TrainingZhe Zhang, Chuan Wu 0001, Zongpeng Li. infocom 2021: 1-10 [doi]
- Momentum-driven adaptive synchronization model for distributed DNN training on HPC clustersZhaorui Zhang, Zhuoran Ji, Cho-Li Wang. jpdc, 159:65-84, 2022. [doi]