Taming unbalanced training workloads in deep learning with partial collective operations

Shigang Li 0002, Tal Ben-Nun, Salvatore Di Girolamo, Dan Alistarh, Torsten Hoefler. Taming unbalanced training workloads in deep learning with partial collective operations. In Rajiv Gupta, Xipeng Shen, editors, PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22-26, 2020. pages 45-61, ACM, 2020. [doi]

Abstract

Abstract is missing.