Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment

Fei Yang, Shuang Peng, Ning Sun, Fangyu Wang, Yuanyuan Wang, Fu Wu, Jiezhong Qiu, Aimin Pan. Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment. In Proceedings of the 53rd International Conference on Parallel Processing, ICPP 2024, Gotland, Sweden, August 12-15, 2024. pages 514-523, ACM, 2024. [doi]

Abstract

Abstract is missing.