Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport

Saeed Rashidi, Pallavi Shurpali, Srinivas Sridharan 0002, Naader Hassani, Dheevatsa Mudigere, Krishnakumar Nair, Misha Smelyanski, Tushar Krishna. Scalable Distributed Training of Recommendation Models: An ASTRA-SIM + NS3 case-study with TCP/IP transport. In IEEE Symposium on High-Performance Interconnects, HOTI 2020, Piscataway, NJ, USA, August 19-21, 2020. pages 33-42, IEEE, 2020. [doi]

Abstract

Abstract is missing.