Supporting task-level fault-tolerance in HPC workflows by launching MPI jobs inside MPI jobs

Matthieu Dorier, Justin M. Wozniak, Robert B. Ross. Supporting task-level fault-tolerance in HPC workflows by launching MPI jobs inside MPI jobs. In Johan Montagnat, Ian Taylor, Sandra Gesing, Rizos Sakellariou, editors, Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science, WORKS@SC 2017, Denver, CO, USA, November 12 - 17, 2017. ACM, 2017. [doi]

Abstract

Abstract is missing.