Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention - researchr publication

researchr

You are not signed in
Sign in
Sign up

Biao Zhang, Ivan Titov, Rico Sennrich. Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention. In Kentaro Inui, Jing Jiang, Vincent Ng, Xiaojun Wan 0001, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019. pages 898-909, Association for Computational Linguistics, 2019. [doi]

Abstract is missing.

runs on WebDSL