Raman Arora, Ofer Dekel, Ambuj Tewari. Deterministic MDPs with Adversarial Rewards and Bandit Feedback. In Nando de Freitas, Kevin P. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, August 14-18, 2012. pages 93-101, AUAI Press, 2012. [doi]
Abstract is missing.