The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models

Alexander Pan, Kush Bhatia, Jacob Steinhardt. The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.