Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio

Stanislaw Jastrzebski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos J. Storkey. Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio. In Vera Kurková, Yannis Manolopoulos, Barbara Hammer, Lazaros S. Iliadis, Ilias Maglogiannis, editors, Artificial Neural Networks and Machine Learning - ICANN 2018 - 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III. Volume 11141 of Lecture Notes in Computer Science, pages 392-402, Springer, 2018. [doi]

@inproceedings{JastrzebskiKABF18,
  title = {Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio},
  author = {Stanislaw Jastrzebski and Zachary Kenton and Devansh Arpit and Nicolas Ballas and Asja Fischer and Yoshua Bengio and Amos J. Storkey},
  year = {2018},
  doi = {10.1007/978-3-030-01424-7_39},
  url = {https://doi.org/10.1007/978-3-030-01424-7_39},
  researchr = {https://researchr.org/publication/JastrzebskiKABF18},
  cites = {0},
  citedby = {0},
  pages = {392-402},
  booktitle = {Artificial Neural Networks and Machine Learning - ICANN 2018 - 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III},
  editor = {Vera Kurková and Yannis Manolopoulos and Barbara Hammer and Lazaros S. Iliadis and Ilias Maglogiannis},
  volume = {11141},
  series = {Lecture Notes in Computer Science},
  publisher = {Springer},
  isbn = {978-3-030-01424-7},
}