Gradient temporal-difference learning for off-policy evaluation using emphatic weightings

Jiaqing Cao, Quan Liu, Fei Zhu, Qiming Fu, Shan Zhong. Gradient temporal-difference learning for off-policy evaluation using emphatic weightings. Inf. Sci., 580:311-330, 2021. [doi]

Authors

Jiaqing Cao

This author has not been identified. Look up 'Jiaqing Cao' in Google

Quan Liu

This author has not been identified. Look up 'Quan Liu' in Google

Fei Zhu

This author has not been identified. Look up 'Fei Zhu' in Google

Qiming Fu

This author has not been identified. Look up 'Qiming Fu' in Google

Shan Zhong

This author has not been identified. Look up 'Shan Zhong' in Google