Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes

Guanghui Lan. Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes. Math. Program., 198(1):1059-1106, March 2023. [doi]

Abstract

Abstract is missing.