Fine-Tuning Language Models with Reward Learning on Policy

Hao Lang, Fei Huang, Yongbin Li. Fine-Tuning Language Models with Reward Learning on Policy. In Kevin Duh, Helena Gómez-Adorno, Steven Bethard, editors, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024, Mexico City, Mexico, June 16-21, 2024. pages 1382-1392, Association for Computational Linguistics, 2024. [doi]

Authors

Hao Lang

This author has not been identified. Look up 'Hao Lang' in Google

Fei Huang

This author has not been identified. Look up 'Fei Huang' in Google

Yongbin Li

This author has not been identified. Look up 'Yongbin Li' in Google