Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems

Jim M. Brandt, Bert J. Debusschere, Ann C. Gentile, Jackson Mayo, Philippe P. Pébay, David Thompson, Matthew Wong. Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems. In 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 19-22 May 2008, Lyon, France. pages 759-764, IEEE Computer Society, 2008. [doi]

Abstract

Abstract is missing.