Logarithmic regret of exploration in average reward Markov decision processes

Victor Boone, Bruno Gaujal. Logarithmic regret of exploration in average reward Markov decision processes. In Nika Haghtalab, Ankur Moitra, editors, The Thirty Eighth Annual Conference on Learning Theory, 30-4 July 2025, Lyon, France. Volume 291 of Proceedings of Machine Learning Research, pages 454-533, PMLR, 2025. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.