Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

Ofer Dekel, Ambuj Tewari, Raman Arora. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress, 2012. [doi]

Abstract

Abstract is missing.