Doomsday: predicting which node will fail when on supercomputers

Anwesha Das, Frank Mueller, Paul Hargrove, Eric Roman, Scott B. Baden. Doomsday: predicting which node will fail when on supercomputers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, TX, USA, November 11-16, 2018. IEEE / ACM, 2018. [doi]

@inproceedings{DasMHRB18,
  title = {Doomsday: predicting which node will fail when on supercomputers},
  author = {Anwesha Das and Frank Mueller and Paul Hargrove and Eric Roman and Scott B. Baden},
  year = {2018},
  url = {http://dl.acm.org/citation.cfm?id=3291668},
  researchr = {https://researchr.org/publication/DasMHRB18},
  cites = {0},
  citedby = {0},
  booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018, Dallas, TX, USA, November 11-16, 2018},
  publisher = {IEEE / ACM},
}