Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment

Hui Jin, Tao Ke, Yong Chen, Xian-He Sun. Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment. In 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012, Ottawa, Canada, May 13-16, 2012. pages 276-283, IEEE, 2012. [doi]

@inproceedings{JinKCS12,
  title = {Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment},
  author = {Hui Jin and Tao Ke and Yong Chen and Xian-He Sun},
  year = {2012},
  doi = {10.1109/CCGrid.2012.61},
  url = {http://doi.ieeecomputersociety.org/10.1109/CCGrid.2012.61},
  researchr = {https://researchr.org/publication/JinKCS12},
  cites = {0},
  citedby = {0},
  pages = {276-283},
  booktitle = {12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012, Ottawa, Canada, May 13-16, 2012},
  publisher = {IEEE},
  isbn = {978-1-4673-1395-7},
}