CoLoR: Co-Located Rescuers for Fault Tolerance in HPC Systems

Zaeem Hussain, Xiaolong Cui, Taieb Znati, Rami G. Melhem. CoLoR: Co-Located Rescuers for Fault Tolerance in HPC Systems. In 24th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2018, Singapore, December 11-13, 2018. pages 569-576, IEEE, 2018. [doi]

@inproceedings{HussainCZM18,
  title = {CoLoR: Co-Located Rescuers for Fault Tolerance in HPC Systems},
  author = {Zaeem Hussain and Xiaolong Cui and Taieb Znati and Rami G. Melhem},
  year = {2018},
  doi = {10.1109/PADSW.2018.8644528},
  url = {https://doi.org/10.1109/PADSW.2018.8644528},
  researchr = {https://researchr.org/publication/HussainCZM18},
  cites = {0},
  citedby = {0},
  pages = {569-576},
  booktitle = {24th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2018, Singapore, December 11-13, 2018},
  publisher = {IEEE},
  isbn = {978-1-5386-7308-9},
}