Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu. Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency. In Gergely Neu, Lorenzo Rosasco, editors, The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India. Volume 195 of Proceedings of Machine Learning Research, pages 4977-5020, PMLR, 2023. [doi]

Abstract

Abstract is missing.