DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification

Daegun Yoon, Sangyoon Oh 0001. DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. In Proceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023. pages 746-755, ACM, 2023. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.