Hare: Exploiting Inter-job and Intra-job Parallelism of Distributed Machine Learning on Heterogeneous GPUs

Fahao Chen, Peng Li, Celimuge Wu, Song Guo. Hare: Exploiting Inter-job and Intra-job Parallelism of Distributed Machine Learning on Heterogeneous GPUs. In Jon B. Weissman, Abhishek Chandra, Ada Gavrilovska, Devesh Tiwari, editors, HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022 - 1 July 2022. pages 253-264, ACM, 2022. [doi]

Abstract

Abstract is missing.