Systemic Assessment of Node Failures in HPC Production Platforms

Anwesha Das, Frank Mueller 0001, Barry Rountree. Systemic Assessment of Node Failures in HPC Production Platforms. In 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. pages 267-276, IEEE, 2021. [doi]

Authors

Anwesha Das

This author has not been identified. Look up 'Anwesha Das' in Google

Frank Mueller 0001

This author has not been identified. Look up 'Frank Mueller 0001' in Google

Barry Rountree

This author has not been identified. Look up 'Barry Rountree' in Google