Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio

Stanislaw Jastrzebski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos J. Storkey. Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio. In Vera Kurková, Yannis Manolopoulos, Barbara Hammer, Lazaros S. Iliadis, Ilias Maglogiannis, editors, Artificial Neural Networks and Machine Learning - ICANN 2018 - 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III. Volume 11141 of Lecture Notes in Computer Science, pages 392-402, Springer, 2018. [doi]

Abstract

Abstract is missing.