Victor Boone, Bruno Gaujal. Logarithmic regret of exploration in average reward Markov decision processes. In Nika Haghtalab, Ankur Moitra, editors, The Thirty Eighth Annual Conference on Learning Theory, 30-4 July 2025, Lyon, France. Volume 291 of Proceedings of Machine Learning Research, pages 454-533, PMLR, 2025. [doi]
Abstract is missing.