TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs

Weiyang Wang, Moein Khazraee, Zhizhen Zhong, Manya Ghobadi, Zhihao Jia, Dheevatsa Mudigere, Ying Zhang 0022, Anthony Kewitsch. TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs. In Mahesh Balakrishnan 0001, Manya Ghobadi, editors, 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, Boston, MA, April 17-19, 2023. pages 739-767, USENIX Association, 2023. [doi]

@inproceedings{WangKZGJM0K23,
  title = {TopoOpt: Co-optimizing Network Topology and Parallelization Strategy for Distributed Training Jobs},
  author = {Weiyang Wang and Moein Khazraee and Zhizhen Zhong and Manya Ghobadi and Zhihao Jia and Dheevatsa Mudigere and Ying Zhang 0022 and Anthony Kewitsch},
  year = {2023},
  url = {https://www.usenix.org/conference/nsdi23/presentation/wang-weiyang},
  researchr = {https://researchr.org/publication/WangKZGJM0K23},
  cites = {0},
  citedby = {0},
  pages = {739-767},
  booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, Boston, MA, April 17-19, 2023},
  editor = {Mahesh Balakrishnan 0001 and Manya Ghobadi},
  publisher = {USENIX Association},
}