Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems

Saurabh Gupta, Devesh Tiwari, Christopher Jantzi, James H. Rogers, Don Maxwell. Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems. In 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2015, Rio de Janeiro, Brazil, June 22-25, 2015. pages 37-44, IEEE, 2015. [doi]

Abstract

Abstract is missing.