MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong 0004, Furu Wei. MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers. In Chengqing Zong, Fei Xia, Wenjie Li 0002, Roberto Navigli, editors, Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021. pages 2140-2151, Association for Computational Linguistics, 2021. [doi]

Authors

Wenhui Wang

This author has not been identified. Look up 'Wenhui Wang' in Google

Hangbo Bao

This author has not been identified. Look up 'Hangbo Bao' in Google

Shaohan Huang

This author has not been identified. Look up 'Shaohan Huang' in Google

Li Dong 0004

This author has not been identified. Look up 'Li Dong 0004' in Google

Furu Wei

This author has not been identified. Look up 'Furu Wei' in Google