AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes

Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou. AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes. In 40th IEEE International Conference on Data Engineering, ICDE 2024, Utrecht, The Netherlands, May 13-16, 2024. pages 5238-5251, IEEE, 2024. [doi]

Abstract

Abstract is missing.