An asymptotically optimal policy for finite support models in the multiarmed bandit problem

Junya Honda, Akimichi Takemura. An asymptotically optimal policy for finite support models in the multiarmed bandit problem. Machine Learning, 85(3):361-391, 2011. [doi]

Abstract

Abstract is missing.