Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training

Quentin Anthony, Donglai Dai. Evaluating Multi-Level Checkpointing for Distributed Deep Neural Network Training. In 2021 SC Workshops Supplementary Proceedings, SC Workshops Supplementary Proceedings 2021, St. Louis, MO, USA, November 14-19, 2021. pages 60-67, IEEE, 2021. [doi]

Abstract

Abstract is missing.