Enabling Parallelism Hot Switching for Efficient Training of Large Language Models

Hao Ge, Fangcheng Fu, Haoyang Li, Xuanyu Wang, Sheng Lin, Yujie Wang, Xiaonan Nie, Hailin Zhang 0004, Xupeng Miao, Bin Cui 0001. Enabling Parallelism Hot Switching for Efficient Training of Large Language Models. In Emmett Witchel, Christopher J. Rossbach, Andrea C. Arpaci-Dusseau, Kimberly Keeton, editors, Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, SOSP 2024, Austin, TX, USA, November 4-6, 2024. pages 178-194, ACM, 2024. [doi]

Authors

Hao Ge

This author has not been identified. Look up 'Hao Ge' in Google

Fangcheng Fu

This author has not been identified. Look up 'Fangcheng Fu' in Google

Haoyang Li

This author has not been identified. Look up 'Haoyang Li' in Google

Xuanyu Wang

This author has not been identified. Look up 'Xuanyu Wang' in Google

Sheng Lin

This author has not been identified. Look up 'Sheng Lin' in Google

Yujie Wang

This author has not been identified. Look up 'Yujie Wang' in Google

Xiaonan Nie

This author has not been identified. Look up 'Xiaonan Nie' in Google

Hailin Zhang 0004

This author has not been identified. Look up 'Hailin Zhang 0004' in Google

Xupeng Miao

This author has not been identified. Look up 'Xupeng Miao' in Google

Bin Cui 0001

This author has not been identified. Look up 'Bin Cui 0001' in Google