Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models

Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail A. Bragin, Ji Li 0006, Caiwen Ding. Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. pages 5113-5121, ijcai.org, 2023. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.