Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy

Yuan Xie, Boyi Liu, Qiang Liu 0001, Zhaoran Wang, Yuan Zhou, Jian Peng 0001. Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. [doi]

Authors

Yuan Xie

This author has not been identified. Look up 'Yuan Xie' in Google

Boyi Liu

This author has not been identified. Look up 'Boyi Liu' in Google

Qiang Liu 0001

This author has not been identified. Look up 'Qiang Liu 0001' in Google

Zhaoran Wang

This author has not been identified. Look up 'Zhaoran Wang' in Google

Yuan Zhou

This author has not been identified. Look up 'Yuan Zhou' in Google

Jian Peng 0001

This author has not been identified. Look up 'Jian Peng 0001' in Google