Hindsight PRIORs for Reward Learning from Human Preferences

Mudit Verma, Katherine Metcalf. Hindsight PRIORs for Reward Learning from Human Preferences. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. [doi]

Abstract

Abstract is missing.