Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy

Yuan Xie, Boyi Liu, Qiang Liu 0001, Zhaoran Wang, Yuan Zhou, Jian Peng 0001. Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. [doi]

Abstract

Abstract is missing.