A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Hayeon Lee, Rui Hou 0007, Jongpil Kim, Davis Liang, Sung Ju Hwang, Alexander Min. A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models. In Anna Rogers, Jordan L. Boyd-Graber, Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023. pages 11239-11246, Association for Computational Linguistics, 2023. [doi]

This author has not been identified. Look up 'Hayeon Lee' in GoogleThis author has not been identified. Look up 'Rui Hou 0007' in GoogleThis author has not been identified. Look up 'Jongpil Kim' in GoogleThis author has not been identified. Look up 'Davis Liang' in GoogleThis author has not been identified. Look up 'Sung Ju Hwang' in GoogleThis author has not been identified. Look up 'Alexander Min' in Google

runs on WebDSL