Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models

Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail A. Bragin, Ji Li 0006, Caiwen Ding. Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. pages 5113-5121, ijcai.org, 2023. [doi]

Authors

Bingbing Li

This author has not been identified. Look up 'Bingbing Li' in Google

Zigeng Wang

This author has not been identified. Look up 'Zigeng Wang' in Google

Shaoyi Huang

This author has not been identified. Look up 'Shaoyi Huang' in Google

Mikhail A. Bragin

This author has not been identified. Look up 'Mikhail A. Bragin' in Google

Ji Li 0006

This author has not been identified. Look up 'Ji Li 0006' in Google

Caiwen Ding

This author has not been identified. Look up 'Caiwen Ding' in Google