Scaling FP8 training to trillion-token LLMs

Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry. Scaling FP8 training to trillion-token LLMs. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

@inproceedings{FishmanCBS25,
  title = {Scaling FP8 training to trillion-token LLMs},
  author = {Maxim Fishman and Brian Chmiel and Ron Banner and Daniel Soudry},
  year = {2025},
  url = {https://openreview.net/forum?id=E1EHO0imOb},
  researchr = {https://researchr.org/publication/FishmanCBS25},
  cites = {0},
  citedby = {0},
  booktitle = {The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025},
  publisher = {OpenReview.net},
}