Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation

Yimeng Wu, Mehdi Rezagholizadeh, Abbas Ghaddar, Md. Akmal Haidar, Ali Ghodsi 0001. Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. pages 7649-7661, Association for Computational Linguistics, 2021. [doi]

Abstract

Abstract is missing.