Shrink or Substitute: Handling Process Failures in HPC Systems Using In-Situ Recovery

Rizwan A. Ashraf, Saurabh Hukerikar, Christian Engelmann. Shrink or Substitute: Handling Process Failures in HPC Systems Using In-Situ Recovery. In Ivan Merelli, Pietro LiĆ², Igor V. Kotenko, editors, 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing, PDP 2018, Cambridge, United Kingdom, March 21-23, 2018. pages 178-185, IEEE Computer Society, 2018. [doi]

Authors

Rizwan A. Ashraf

This author has not been identified. Look up 'Rizwan A. Ashraf' in Google

Saurabh Hukerikar

This author has not been identified. Look up 'Saurabh Hukerikar' in Google

Christian Engelmann

This author has not been identified. Look up 'Christian Engelmann' in Google