Understanding the Failure of Batch Normalization for Transformers in NLP

Jiaxi Wang, Ji Wu, Lei Huang. Understanding the Failure of Batch Normalization for Transformers in NLP. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

@inproceedings{WangWH22-8,
  title = {Understanding the Failure of Batch Normalization for Transformers in NLP},
  author = {Jiaxi Wang and Ji Wu and Lei Huang},
  year = {2022},
  url = {http://papers.nips.cc/paper_files/paper/2022/hash/f4f2f2b3c67da711df6df557fc870c4a-Abstract-Conference.html},
  researchr = {https://researchr.org/publication/WangWH22-8},
  cites = {0},
  citedby = {0},
  booktitle = {Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022},
  editor = {Sanmi Koyejo and S. Mohamed and A. Agarwal and Danielle Belgrave and K. Cho and A. Oh},
  isbn = {9781713871088},
}