EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications

Sourav Chakraborty 0003, Ignacio Laguna, Murali Emani, Kathryn Mohror, Dhabaleswar K. Panda, Martin Schulz 0001, Hari Subramoni. EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications. Concurrency - Practice and Experience, 32(3), 2020. [doi]

Abstract

Abstract is missing.