A reliability-aware approach for an optimal checkpoint/restart model in HPC environments

Yudan Liu, Raja Nassar, Chokchai Leangsuksun, Nichamon Naksinehaboon, Mihaela Paun, Stephen L. Scott. A reliability-aware approach for an optimal checkpoint/restart model in HPC environments. In Proceedings of the 2007 IEEE International Conference on Cluster Computing, 17-20 September 2007, Austin, Texas, USA. pages 452-457, IEEE, 2007. [doi]

Abstract

Abstract is missing.