Designing and Modelling Selective Replication for Fault-tolerant HPC Applications

Omer Subasi, Gulay Yalcin, Ferad Zyulkyarov, Osman S. Unsal, Jesús Labarta. Designing and Modelling Selective Replication for Fault-tolerant HPC Applications. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Madrid, Spain, May 14-17, 2017. pages 452-457, IEEE Computer Society / ACM, 2017. [doi]