Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models

Bingbing Li, Zigeng Wang, Shaoyi Huang, Mikhail A. Bragin, Ji Li 0006, Caiwen Ding. Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. pages 5113-5121, ijcai.org, 2023. [doi]

@inproceedings{LiWHB0D23,
  title = {Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models},
  author = {Bingbing Li and Zigeng Wang and Shaoyi Huang and Mikhail A. Bragin and Ji Li 0006 and Caiwen Ding},
  year = {2023},
  doi = {10.24963/ijcai.2023/568},
  url = {https://doi.org/10.24963/ijcai.2023/568},
  researchr = {https://researchr.org/publication/LiWHB0D23},
  cites = {0},
  citedby = {0},
  pages = {5113-5121},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China},
  publisher = {ijcai.org},
}