Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI

Raghunath Rajachandrasekar, Xavier Besseron, Dhabaleswar K. Panda. Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI. In 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPS 2012, Shanghai, China, May 21-25, 2012. pages 1136-1143, IEEE Computer Society, 2012. [doi]

Abstract

Abstract is missing.