System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang 0006, Minjia Zhang, Reza Yazdani Aminabadi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He. System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. In IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 - Workshop, San Francisco, CA, USA, May 27-31, 2024. pages 1206-1208, IEEE, 2024. [doi]

Abstract

Abstract is missing.