The following publications are possibly variants of this publication:
- Joint structured pruning and dense knowledge distillation for efficient transformer model compressionBaiyun Cui, Yingming Li, Zhongfei Zhang. ijon, 458:56-69, 2021. [doi]
- Adaptive Contrastive Knowledge Distillation for BERT CompressionJinyang Guo, Jiaheng Liu, Zining Wang, Yuqing Ma, Ruihao Gong, Ke Xu, Xianglong Liu 0001. acl 2023: 8941-8953 [doi]
- Model Compression Using Progressive Channel PruningJinyang Guo, Weichen Zhang, Wanli Ouyang, Dong Xu 0001. tcsv, 31(3):1114-1124, 2021. [doi]