Deterministic MDPs with Adversarial Rewards and Bandit Feedback

Raman Arora, Ofer Dekel, Ambuj Tewari. Deterministic MDPs with Adversarial Rewards and Bandit Feedback. In Nando de Freitas, Kevin P. Murphy, editors, Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, August 14-18, 2012. pages 93-101, AUAI Press, 2012. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.