Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback

Anirban Santara, Gaurav Aggarwal, Shuai Li, Claudio Gentile. Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback. In Gustau Camps-Valls, Francisco J. R. Ruiz, Isabel Valera, editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event. Volume 151 of Proceedings of Machine Learning Research, pages 767-797, PMLR, 2022. [doi]

Abstract

Abstract is missing.