DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification

Daegun Yoon, Sangyoon Oh 0001. DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. In Proceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023. pages 746-755, ACM, 2023. [doi]

Abstract

Abstract is missing.