Online Fault Classification in HPC Systems Through Machine Learning

Alessio Netti, Zeynep Kiziltan, Özalp Babaoglu, Alina Sîrbu, Andrea Bartolini, Andrea Borghesi. Online Fault Classification in HPC Systems Through Machine Learning. In Ramin Yahyapour, editor, Euro-Par 2019: Parallel Processing - 25th International Conference on Parallel and Distributed Computing, Göttingen, Germany, August 26-30, 2019, Proceedings. Volume 11725 of Lecture Notes in Computer Science, pages 3-16, Springer, 2019. [doi]

Abstract

Abstract is missing.