No Free Lunch: Overcoming Reward Gaming in AI Safety Gridworlds

Mariya Tsvarkaleva, Louise A. Dennis. No Free Lunch: Overcoming Reward Gaming in AI Safety Gridworlds. In Ibrahim Habli, Mark Sujan, Simos Gerasimou, Erwin Schoitsch, Friedemann Bitsch, editors, Computer Safety, Reliability, and Security. SAFECOMP 2021 Workshops - DECSoS, MAPSOD, DepDevOps, USDAI, and WAISE, York, UK, September 7, 2021, Proceedings. Volume 12853 of Lecture Notes in Computer Science, pages 226-238, Springer, 2021. [doi]

Authors

Mariya Tsvarkaleva

This author has not been identified. Look up 'Mariya Tsvarkaleva' in Google

Louise A. Dennis

This author has not been identified. Look up 'Louise A. Dennis' in Google