Batch learning from logged bandit feedback through counterfactual risk minimization

Adith Swaminathan, Thorsten Joachims. Batch learning from logged bandit feedback through counterfactual risk minimization. Journal of Machine Learning Research, 16:1731-1755, 2015. [doi]

Possibly Related Publications

The following publications are possibly variants of this publication: