Block-Checksum-Based Fault Tolerance for Matrix Multiplication on Large-Scale Parallel Systems

Yanchao Zhu, Yi Liu, Mingzhen Li, Depei Qian. Block-Checksum-Based Fault Tolerance for Matrix Multiplication on Large-Scale Parallel Systems. In 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Exeter, United Kingdom, June 28-30, 2018. pages 172-179, IEEE, 2018. [doi]

Abstract

Abstract is missing.