Online Markov Decision Processes under Bandit Feedback

Gergely Neu, András György, Csaba Szepesvári, András Antos. Online Markov Decision Processes under Bandit Feedback. In John D. Lafferty, Christopher K. I. Williams, John Shawe-Taylor, Richard S. Zemel, Aron Culotta, editors, Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada. pages 1804-1812, Curran Associates, Inc., 2010. [doi]

Abstract

Abstract is missing.