MMPI: A Scalable Fault Tolerance Mechanism for MPI Large Scale Parallel Computing

Zhiyuan Wang, Xuejun Yang, Yun Zhou. MMPI: A Scalable Fault Tolerance Mechanism for MPI Large Scale Parallel Computing. In 10th IEEE International Conference on Computer and Information Technology, CIT 2010, Bradford, West Yorkshire, UK, June 29-July 1, 2010. pages 1251-1256, IEEE Computer Society, 2010. [doi]

Abstract

Abstract is missing.