Bo Liu 0006, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu 0002, Sridhar Mahadevan, Marek Petrik. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity. J. Artif. Intell. Res. (JAIR), 63:461-494, 2018. [doi]
Abstract is missing.