A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models

Hayeon Lee, Rui Hou 0007, Jongpil Kim, Davis Liang, Sung Ju Hwang, Alexander Min. A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models. In Anna Rogers, Jordan L. Boyd-Graber, Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023. pages 11239-11246, Association for Computational Linguistics, 2023. [doi]

Authors

Hayeon Lee

This author has not been identified. Look up 'Hayeon Lee' in Google

Rui Hou 0007

This author has not been identified. Look up 'Rui Hou 0007' in Google

Jongpil Kim

This author has not been identified. Look up 'Jongpil Kim' in Google

Davis Liang

This author has not been identified. Look up 'Davis Liang' in Google

Sung Ju Hwang

This author has not been identified. Look up 'Sung Ju Hwang' in Google

Alexander Min

This author has not been identified. Look up 'Alexander Min' in Google