Per-decision Multi-step Temporal Difference Learning with Control Variates

Kristopher De Asis, Richard S. Sutton. Per-decision Multi-step Temporal Difference Learning with Control Variates. In Amir Globerson, Ricardo Silva, editors, Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018. pages 786-794, AUAI Press, 2018. [doi]

Abstract

Abstract is missing.