Optimistic Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds

Shipra Agrawal 0001, Randy Jia. Optimistic Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds. Math. Oper. Res., 48(1):363-392, February 2023. [doi]

Abstract

Abstract is missing.