On Layer Normalization in the Transformer Architecture

Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, Tie-Yan Liu. On Layer Normalization in the Transformer Architecture. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Volume 119 of Proceedings of Machine Learning Research, pages 10524-10533, PMLR, 2020. [doi]

@inproceedings{XiongYHZZXZLWL20,
  title = {On Layer Normalization in the Transformer Architecture},
  author = {Ruibin Xiong and Yunchang Yang and Di He and Kai Zheng and Shuxin Zheng and Chen Xing and Huishuai Zhang and Yanyan Lan and Liwei Wang and Tie-Yan Liu},
  year = {2020},
  url = {http://proceedings.mlr.press/v119/xiong20b.html},
  researchr = {https://researchr.org/publication/XiongYHZZXZLWL20},
  cites = {0},
  citedby = {0},
  pages = {10524-10533},
  booktitle = {Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event},
  volume = {119},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
}