Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models

Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail A. Bragin, Ji Li 0006, Caiwen Ding. Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. pages 5113-5121, ijcai.org, 2023. [doi]

Abstract

Abstract is missing.