Supporting task-level fault-tolerance in HPC workflows by launching MPI jobs inside MPI jobs

Matthieu Dorier, Justin M. Wozniak, Robert B. Ross. Supporting task-level fault-tolerance in HPC workflows by launching MPI jobs inside MPI jobs. In Johan Montagnat, Ian Taylor, Sandra Gesing, Rizos Sakellariou, editors, Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science, WORKS@SC 2017, Denver, CO, USA, November 12 - 17, 2017. ACM, 2017. [doi]

Possibly Related Publications

The following publications are possibly variants of this publication: