Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition

Canzhe Zhao, Ruofeng Yang, Baoxiang Wang 0001, Shuai Li 0010. Learning Adversarial Linear Mixture Markov Decision Processes with Bandit Feedback and Unknown Transition. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. [doi]

Abstract

Abstract is missing.