No Free Lunch: Overcoming Reward Gaming in AI Safety Gridworlds

Mariya Tsvarkaleva, Louise A. Dennis. No Free Lunch: Overcoming Reward Gaming in AI Safety Gridworlds. In Ibrahim Habli, Mark Sujan, Simos Gerasimou, Erwin Schoitsch, Friedemann Bitsch, editors, Computer Safety, Reliability, and Security. SAFECOMP 2021 Workshops - DECSoS, MAPSOD, DepDevOps, USDAI, and WAISE, York, UK, September 7, 2021, Proceedings. Volume 12853 of Lecture Notes in Computer Science, pages 226-238, Springer, 2021. [doi]