On Layer Normalization in the Transformer Architecture

Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu. On Layer Normalization in the Transformer Architecture. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Volume 119 of Proceedings of Machine Learning Research, pages 10524-10533, PMLR, 2020. [doi]

Authors

Ruibin Xiong

This author has not been identified. Look up 'Ruibin Xiong' in Google

Yunchang Yang

This author has not been identified. Look up 'Yunchang Yang' in Google

Di He

This author has not been identified. Look up 'Di He' in Google

Kai Zheng

This author has not been identified. Look up 'Kai Zheng' in Google

Shuxin Zheng

This author has not been identified. Look up 'Shuxin Zheng' in Google

Chen Xing

This author has not been identified. Look up 'Chen Xing' in Google

Huishuai Zhang

This author has not been identified. Look up 'Huishuai Zhang' in Google

Yanyan Lan

This author has not been identified. Look up 'Yanyan Lan' in Google

Liwei Wang

This author has not been identified. Look up 'Liwei Wang' in Google

Tie-Yan Liu

This author has not been identified. Look up 'Tie-Yan Liu' in Google