Scalable group-based checkpoint/restart for large-scale message-passing systems

Justin C. Y. Ho, Cho-Li Wang, Francis C. M. Lau. Scalable group-based checkpoint/restart for large-scale message-passing systems. In 22nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, Miami, Florida USA, April 14-18, 2008. pages 1-12, IEEE, 2008. [doi]

Abstract

Abstract is missing.