A Failure Prediction-Based Adaptive Checkpointing Method with Less Reliance on Temperature Monitoring for HPC Applications

Muhammad Alfian Amrizal, Pei Li, Mulya Agung, Ryusuke Egawa, Hiroyuki Takizawa. A Failure Prediction-Based Adaptive Checkpointing Method with Less Reliance on Temperature Monitoring for HPC Applications. In IEEE International Conference on Cluster Computing, CLUSTER 2018, Belfast, UK, September 10-13, 2018. pages 515-523, IEEE Computer Society, 2018. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.