Logarithmic regret of exploration in average reward Markov decision processes

Victor Boone, Bruno Gaujal. Logarithmic regret of exploration in average reward Markov decision processes. In Nika Haghtalab, Ankur Moitra, editors, The Thirty Eighth Annual Conference on Learning Theory, 30-4 July 2025, Lyon, France. Volume 291 of Proceedings of Machine Learning Research, pages 454-533, PMLR, 2025. [doi]

Abstract

Abstract is missing.