Ao Sun, Weilin Zhao, Xu Han 0007, Cheng Yang 0002, Zhiyuan Liu 0001, Chuan Shi 0001, Maosong Sun 0001. BurstEngine: An efficient distributed framework for training transformers On extremely Long sequences of over 1M tokens. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2025, St. Louis, MO, USA, November 16-21, 2025. pages 1429-1445, ACM, 2025. [doi]
Abstract is missing.