Failure Data-Driven Selective Node-Level Duplication to Improve MTTF in High Performance Computing Systems

Nithin Nakka, Alok N. Choudhary. Failure Data-Driven Selective Node-Level Duplication to Improve MTTF in High Performance Computing Systems. In Douglas J. K. Mewhort, Natalie M. Cann, Gary W. Slater, Thomas J. Naughton, editors, High Performance Computing Systems and Applications, 23rd International Symposium, HPCS 2009, Kingston, ON, Canada, June 14-17, 2009, Revised Selected Papers. Volume 5976 of Lecture Notes in Computer Science, pages 304-322, Springer, 2009. [doi]

Abstract

Abstract is missing.