Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Bo Liu 0006, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu 0002, Sridhar Mahadevan, Marek Petrik. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity. J. Artif. Intell. Res. (JAIR), 63:461-494, 2018. [doi]

Abstract

Abstract is missing.