Heting Liu, Zhichao Li, Cheng Tan, Rongqiu Yang, Guohong Cao, Zherui Liu, Chuanxiong Guo. Predicting GPU Failures With High Precision Under Deep Learning Workloads. In Yosef Moatti, Ofer Biran, Yossi Gilad, Dejan Kostic, editors, Proceedings of the 16th ACM International Conference on Systems and Storage, SYSTOR 2023, Haifa, Israel, June 5-7, 2023. pages 124-135, ACM, 2023. [doi]
Abstract is missing.