Gradient temporal-difference learning for off-policy evaluation using emphatic weightings - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Jiaqing Cao, Quan Liu, Fei Zhu, Qiming Fu, Shan Zhong. Gradient temporal-difference learning for off-policy evaluation using emphatic weightings. Inf. Sci., 580:311-330, 2021. [doi]

This author has not been identified. Look up 'Jiaqing Cao' in GoogleThis author has not been identified. Look up 'Quan Liu' in GoogleThis author has not been identified. Look up 'Fei Zhu' in GoogleThis author has not been identified. Look up 'Qiming Fu' in GoogleThis author has not been identified. Look up 'Shan Zhong' in Google

runs on WebDSL