Confronting Reward Model Overoptimization with Constrained RLHF

Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Marcus McAleer. Confronting Reward Model Overoptimization with Constrained RLHF. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.