Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Raman Arora, Ofer Dekel, Ambuj Tewari. Deterministic MDPs with Adversarial Rewards and Bandit Feedback. In Nando de Freitas, Kevin P. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, August 14-18, 2012. pages 93-101, AUAI Press, 2012. [doi]

Authors

Raman Arora

This author has not been identified. Look up 'Raman Arora' in Google

Ofer Dekel

This author has not been identified. Look up 'Ofer Dekel' in Google

Ambuj Tewari

This author has not been identified. Look up 'Ambuj Tewari' in Google