TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica. TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models. In Marina Meila, Tong Zhang 0001, editors, Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Volume 139 of Proceedings of Machine Learning Research, pages 6543-6552, PMLR, 2021. [doi]

@inproceedings{LiZGZZSS21,
  title = {TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models},
  author = {Zhuohan Li and Siyuan Zhuang and Shiyuan Guo and Danyang Zhuo and Hao Zhang and Dawn Song and Ion Stoica},
  year = {2021},
  url = {http://proceedings.mlr.press/v139/li21y.html},
  researchr = {https://researchr.org/publication/LiZGZZSS21},
  cites = {0},
  citedby = {0},
  pages = {6543-6552},
  booktitle = {Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event},
  editor = {Marina Meila and Tong Zhang 0001},
  volume = {139},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
}