Exploring the Effects of Silent Data Corruption in Distributed Deep Learning Training

Elvis Rojas, Diego PĂ©rez, Esteban Meneses. Exploring the Effects of Silent Data Corruption in Distributed Deep Learning Training. In 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Bordeaux, France, November 2-5, 2022. pages 21-30, IEEE, 2022. [doi]

Abstract

Abstract is missing.