How to Evaluate Reward Models for RLHF - researchr publication related

researchr

You are not signed in
Sign in
Sign up

Evan Frick, Tianle Li, Connor Chen, Wei-Lin Chiang, Anastasios Nikolas Angelopoulos, Jiantao Jiao, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica. How to Evaluate Reward Models for RLHF. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

The following publications are possibly variants of this publication:

Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage PerspectiveJiawei Huang, Bingcong Li, Christoph Dann, Niao He. icml 2025: [doi]

RuleAdapter: Dynamic Rules for training Safety Reward Models in RLHFXiaomin Li, Mingye Gao, Zhiwei Zhang, Jingxuan Fan, Weiyu Li. icml 2025: [doi]

runs on WebDSL