Fine-Tuning Language Models with Reward Learning on Policy - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Hao Lang, Fei Huang, Yongbin Li. Fine-Tuning Language Models with Reward Learning on Policy. In Kevin Duh, Helena Gómez-Adorno, Steven Bethard, editors, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024, Mexico City, Mexico, June 16-21, 2024. pages 1382-1392, Association for Computational Linguistics, 2024. [doi]

This author has not been identified. Look up 'Hao Lang' in GoogleThis author has not been identified. Look up 'Fei Huang' in GoogleThis author has not been identified. Look up 'Yongbin Li' in Google

runs on WebDSL