Stabilizing Policy Gradient Methods via Reward Profiling

Shihab Ahmed, El Houcine Bergou, Yue Wang, Aritra Dutta. Stabilizing Policy Gradient Methods via Reward Profiling. In Sven Koenig, Chad Jenkins, Matthew E. Taylor, editors, Fortieth AAAI Conference on Artificial Intelligence, Thirty-Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Advances in Artificial Intelligence, AAAI 2026, Singapore, January 20-27, 2026. pages 19560-19568, AAAI Press, 2026. [doi]

Abstract

Abstract is missing.