No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu 0003, Jianfeng Gao, Weizhu Chen, Tuo Zhao. No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [doi]

Abstract

Abstract is missing.