Themis: a network bandwidth-aware collective scheduling policy for distributed training of DL models

Saeed Rashidi, William Won, Sudarshan Srinivasan, Srinivas Sridharan 0002, Tushar Krishna. Themis: a network bandwidth-aware collective scheduling policy for distributed training of DL models. In Valentina Salapura, Mohamed Zahran 0001, Fred Chong, Lingjia Tang, editors, ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18 - 22, 2022. pages 581-596, ACM, 2022. [doi]

Abstract

Abstract is missing.