Temporal-difference emphasis learning with regularized correction for off-policy evaluation and control

Jiaqing Cao, Quan Liu, Lan Wu, Qiming Fu 0001, Shan Zhong. Temporal-difference emphasis learning with regularized correction for off-policy evaluation and control. Appl. Intell., 53(18):20917-20937, September 2023. [doi]

Abstract

Abstract is missing.