Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback

Anirban Santara, Gaurav Aggarwal, Shuai Li, Claudio Gentile. Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback. In Gustau Camps-Valls, Francisco J. R. Ruiz, Isabel Valera, editors, International Conference on Artificial Intelligence and Statistics, AISTATS 2022, 28-30 March 2022, Virtual Event. Volume 151 of Proceedings of Machine Learning Research, pages 767-797, PMLR, 2022. [doi]

Authors

Anirban Santara

This author has not been identified. Look up 'Anirban Santara' in Google

Gaurav Aggarwal

This author has not been identified. Look up 'Gaurav Aggarwal' in Google

Shuai Li

This author has not been identified. Look up 'Shuai Li' in Google

Claudio Gentile

This author has not been identified. Look up 'Claudio Gentile' in Google