ODIN: Disentangled Reward Mitigates Hacking in RLHF - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Lichang Chen, Chen Zhu 0001, Jiuhai Chen, Davit Soselia, Tianyi Zhou 0001, Tom Goldstein, Heng Huang, Mohammad Shoeybi, Bryan Catanzaro. ODIN: Disentangled Reward Mitigates Hacking in RLHF. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024. [doi]

This author has not been identified. Look up 'Lichang Chen' in GoogleThis author has not been identified. Look up 'Chen Zhu 0001' in GoogleThis author has not been identified. Look up 'Jiuhai Chen' in GoogleThis author has not been identified. Look up 'Davit Soselia' in GoogleThis author has not been identified. Look up 'Tianyi Zhou 0001' in GoogleThis author has not been identified. Look up 'Tom Goldstein' in GoogleThis author has not been identified. Look up 'Heng Huang' in GoogleThis author has not been identified. Look up 'Mohammad Shoeybi' in GoogleThis author has not been identified. Look up 'Bryan Catanzaro' in Google

runs on WebDSL