Reliability of Large Scale GPU Clusters for Deep Learning Workloads

Junjie Qian, Taeyoon Kim, Myeongjae Jeon. Reliability of Large Scale GPU Clusters for Deep Learning Workloads. In Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang 0001, Leila Zia, editors, Companion of The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021. pages 179-181, ACM / IW3C2, 2021. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.