A Case for Epidemic Fault Detection and Group Membership in HPC Storage Systems

Shane Snyder, Philip H. Carns, Jonathan Jenkins, Kevin Harms, Robert B. Ross, Misbah Mubarak, Christopher D. Carothers. A Case for Epidemic Fault Detection and Group Membership in HPC Storage Systems. In Stephen A. Jarvis, Steven A. Wright, Simon D. Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation - 5th International Workshop, PMBS 2014, New Orleans, LA, USA, November 16, 2014. Revised Selected Papers. Volume 8966 of Lecture Notes in Computer Science, pages 237-248, Springer, 2014. [doi]

Authors

Shane Snyder

This author has not been identified. Look up 'Shane Snyder' in Google

Philip H. Carns

This author has not been identified. Look up 'Philip H. Carns' in Google

Jonathan Jenkins

This author has not been identified. Look up 'Jonathan Jenkins' in Google

Kevin Harms

This author has not been identified. Look up 'Kevin Harms' in Google

Robert B. Ross

This author has not been identified. Look up 'Robert B. Ross' in Google

Misbah Mubarak

This author has not been identified. Look up 'Misbah Mubarak' in Google

Christopher D. Carothers

This author has not been identified. Look up 'Christopher D. Carothers' in Google