A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

Zeke Xie, Issei Sato, Masashi Sugiyama. A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [doi]

Authors

Zeke Xie

This author has not been identified. Look up 'Zeke Xie' in Google

Issei Sato

This author has not been identified. Look up 'Issei Sato' in Google

Masashi Sugiyama

This author has not been identified. Look up 'Masashi Sugiyama' in Google