Scaling FP8 training to trillion-token LLMs

Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry. Scaling FP8 training to trillion-token LLMs. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Authors

Maxim Fishman

This author has not been identified. Look up 'Maxim Fishman' in Google

Brian Chmiel

This author has not been identified. Look up 'Brian Chmiel' in Google

Ron Banner

This author has not been identified. Look up 'Ron Banner' in Google

Daniel Soudry

This author has not been identified. Look up 'Daniel Soudry' in Google