Batch learning from logged bandit feedback through counterfactual risk minimization - researchr publication related

researchr

You are not signed in
Sign in
Sign up

Adith Swaminathan, Thorsten Joachims. Batch learning from logged bandit feedback through counterfactual risk minimization. Journal of Machine Learning Research, 16:1731-1755, 2015. [doi]

The following publications are possibly variants of this publication:

Counterfactual Risk Minimization: Learning from Logged Bandit FeedbackAdith Swaminathan, Thorsten Joachims. icml 2015: 814-823 [doi]

Counterfactual evaluation and learning from logged user feedbackAdith Swaminathan. PhD thesis, Cornell University, USA, 2017.

Variance-Minimizing Augmentation Logging for Counterfactual Evaluation in Contextual BanditsAaron David Tucker, Thorsten Joachims. wsdm 2023: 967-975 [doi]

Deep Learning with Logged Bandit FeedbackThorsten Joachims, Adith Swaminathan, Maarten de Rijke. iclr 2018: [doi]

runs on WebDSL