Towards Low-Overhead Resilience for Data Parallel Deep Learning

Bogdan Nicolae, Tanner Hobson, Orcun Yildiz, Tom Peterka, Dmitriy Morozov. Towards Low-Overhead Resilience for Data Parallel Deep Learning. In 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, May 16-19, 2022. pages 336-345, IEEE, 2022. [doi]

@inproceedings{NicolaeHYPM22,
  title = {Towards Low-Overhead Resilience for Data Parallel Deep Learning},
  author = {Bogdan Nicolae and Tanner Hobson and Orcun Yildiz and Tom Peterka and Dmitriy Morozov},
  year = {2022},
  doi = {10.1109/CCGrid54584.2022.00043},
  url = {https://doi.org/10.1109/CCGrid54584.2022.00043},
  researchr = {https://researchr.org/publication/NicolaeHYPM22},
  cites = {0},
  citedby = {0},
  pages = {336-345},
  booktitle = {22nd IEEE International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, May 16-19, 2022},
  publisher = {IEEE},
  isbn = {978-1-6654-9956-9},
}