Improving checkpointing intervals by considering individual job failure probabilities

Alvaro Frank, Manuel Baumgartner, Reza Salkhordeh, André Brinkmann. Improving checkpointing intervals by considering individual job failure probabilities. In 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. pages 299-309, IEEE, 2021. [doi]

Abstract

Abstract is missing.