Optimistic Policy Optimization with Bandit Feedback

Lior Shani, Yonathan Efroni, Aviv Rosenberg 0002, Shie Mannor. Optimistic Policy Optimization with Bandit Feedback. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Volume 119 of Proceedings of Machine Learning Research, pages 8604-8613, PMLR, 2020. [doi]

Abstract

Abstract is missing.