A Case for Epidemic Fault Detection and Group Membership in HPC Storage Systems

Shane Snyder, Philip H. Carns, Jonathan Jenkins, Kevin Harms, Robert B. Ross, Misbah Mubarak, Christopher D. Carothers. A Case for Epidemic Fault Detection and Group Membership in HPC Storage Systems. In Stephen A. Jarvis, Steven A. Wright, Simon D. Hammond, editors, High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation - 5th International Workshop, PMBS 2014, New Orleans, LA, USA, November 16, 2014. Revised Selected Papers. Volume 8966 of Lecture Notes in Computer Science, pages 237-248, Springer, 2014. [doi]

@inproceedings{SnyderCJHRMC14,
  title = {A Case for Epidemic Fault Detection and Group Membership in HPC Storage Systems},
  author = {Shane Snyder and Philip H. Carns and Jonathan Jenkins and Kevin Harms and Robert B. Ross and Misbah Mubarak and Christopher D. Carothers},
  year = {2014},
  doi = {10.1007/978-3-319-17248-4_12},
  url = {http://dx.doi.org/10.1007/978-3-319-17248-4_12},
  researchr = {https://researchr.org/publication/SnyderCJHRMC14},
  cites = {0},
  citedby = {0},
  pages = {237-248},
  booktitle = {High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation - 5th International Workshop, PMBS 2014, New Orleans, LA, USA, November 16, 2014. Revised Selected Papers},
  editor = {Stephen A. Jarvis and Steven A. Wright and Simon D. Hammond},
  volume = {8966},
  series = {Lecture Notes in Computer Science},
  publisher = {Springer},
  isbn = {978-3-319-17247-7},
}