Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Atli Kosson, Bettina Messmer, Martin Jaggi. Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, Cheng Zhang 0005, editors, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024. 2024. [doi]

Authors

Atli Kosson

This author has not been identified. Look up 'Atli Kosson' in Google

Bettina Messmer

This author has not been identified. Look up 'Bettina Messmer' in Google

Martin Jaggi

This author has not been identified. Look up 'Martin Jaggi' in Google