H3T: Efficient Integration of Memory Optimization and Parallelism for Large-scale Transformer Training

Yuzhong Wang, Xu Han, Weilin Zhao, Guoyang Zeng, Zhiyuan Liu, Maosong Sun 0001. H3T: Efficient Integration of Memory Optimization and Parallelism for Large-scale Transformer Training. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023. 2023. [doi]

Authors

Yuzhong Wang

This author has not been identified. Look up 'Yuzhong Wang' in Google

Xu Han

This author has not been identified. Look up 'Xu Han' in Google

Weilin Zhao

This author has not been identified. Look up 'Weilin Zhao' in Google

Guoyang Zeng

This author has not been identified. Look up 'Guoyang Zeng' in Google

Zhiyuan Liu

This author has not been identified. Look up 'Zhiyuan Liu' in Google

Maosong Sun 0001

This author has not been identified. Look up 'Maosong Sun 0001' in Google