Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, Guoqing Harry Xu. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs. In Mahesh Balakrishnan 0001, Manya Ghobadi, editors, 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, Boston, MA, April 17-19, 2023. pages 497-513, USENIX Association, 2023. [doi]

Authors

John Thorpe

This author has not been identified. Look up 'John Thorpe' in Google

Pengzhan Zhao

This author has not been identified. Look up 'Pengzhan Zhao' in Google

Jonathan Eyolfson

This author has not been identified. Look up 'Jonathan Eyolfson' in Google

Yifan Qiao

This author has not been identified. Look up 'Yifan Qiao' in Google

Zhihao Jia

This author has not been identified. Look up 'Zhihao Jia' in Google

Minjia Zhang

This author has not been identified. Look up 'Minjia Zhang' in Google

Ravi Netravali

This author has not been identified. Look up 'Ravi Netravali' in Google

Guoqing Harry Xu

This author has not been identified. Look up 'Guoqing Harry Xu' in Google