Online Markov Decision Processes with Aggregate Bandit Feedback

Alon Cohen, Haim Kaplan, Tomer Koren, Yishay Mansour. Online Markov Decision Processes with Aggregate Bandit Feedback. In Mikhail Belkin, Samory Kpotufe, editors, Conference on Learning Theory, COLT 2021, 15-19 August 2021, Boulder, Colorado, USA. Volume 134 of Proceedings of Machine Learning Research, pages 1301-1329, PMLR, 2021. [doi]

Abstract

Abstract is missing.