Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang 0002, Jiatao Gu, Joshua M. Susskind. Stabilizing Transformer Training by Preventing Attention Entropy Collapse. In Andreas Krause 0001, Emma Brunskill, KyungHyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Volume 202 of Proceedings of Machine Learning Research, pages 40770-40803, PMLR, 2023. [doi]
@inproceedings{ZhaiLLBR0GS23, title = {Stabilizing Transformer Training by Preventing Attention Entropy Collapse}, author = {Shuangfei Zhai and Tatiana Likhomanenko and Etai Littwin and Dan Busbridge and Jason Ramapuram and Yizhe Zhang 0002 and Jiatao Gu and Joshua M. Susskind}, year = {2023}, url = {https://proceedings.mlr.press/v202/zhai23a.html}, researchr = {https://researchr.org/publication/ZhaiLLBR0GS23}, cites = {0}, citedby = {0}, pages = {40770-40803}, booktitle = {International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA}, editor = {Andreas Krause 0001 and Emma Brunskill and KyungHyun Cho and Barbara Engelhardt and Sivan Sabato and Jonathan Scarlett}, volume = {202}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, }