An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem

Ryo Watanabe, Atsuyoshi Nakamura, Mineichi Kudo. An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem. Oper. Res. Lett., 43(6):558-563, 2015. [doi]

Abstract

Abstract is missing.