Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning

Lingwei Zhu, Takamitsu Matsubara. Cautious policy programming: exploiting KL regularization for monotonic policy improvement in reinforcement learning. Machine Learning, 112(11):4527-4562, November 2023. [doi]

Abstract

Abstract is missing.