Gradient temporal-difference learning for off-policy evaluation using emphatic weightings - researchr publication references

researchr

You are not signed in
Sign in
Sign up

Jiaqing Cao, Quan Liu, Fei Zhu, Qiming Fu, Shan Zhong. Gradient temporal-difference learning for off-policy evaluation using emphatic weightings. Inf. Sci., 580:311-330, 2021. [doi]

No references recorded for this publication.

No citations of this publication recorded.

runs on WebDSL