Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning

Pan Zhou, Jiashi Feng, Chao Ma 0012, Caiming Xiong, Steven Chu Hong Hoi, Weinan E. Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. 2020. [doi]

Abstract

Abstract is missing.