Distributed Training of Large Language Models on AWS Trainium

Xinwei Fu, Zhen Zhang, Haozheng Fan, Guangtai Huang, Mohammad El-Shabani, Randy Huang, Rahul Solanki, Fei Wu, Ron Diamant, Yida Wang 0003. Distributed Training of Large Language Models on AWS Trainium. In Proceedings of the 2024 ACM Symposium on Cloud Computing, SoCC 2024, Redmond, WA, USA, November 20-22, 2024. pages 961-976, ACM, 2024. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.