Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment

Hui Jin, Tao Ke, Yong Chen, Xian-He Sun. Checkpointing Orchestration: Toward a Scalable HPC Fault-Tolerant Environment. In 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012, Ottawa, Canada, May 13-16, 2012. pages 276-283, IEEE, 2012. [doi]

Abstract

Abstract is missing.