Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

Prasenjit Karmakar, Shalabh Bhatnagar. Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning. Math. Oper. Res., 43(1):130-151, 2018. [doi]

Abstract

Abstract is missing.