DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks

Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura 0001, Geoffrey Zweig. DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020. pages 6899-6903, IEEE, 2020. [doi]

Abstract

Abstract is missing.