Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

Kush Bhatia, Wenshuo Guo, Jacob Steinhardt. Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws. In Francisco J. R. Ruiz, Jennifer G. Dy, Jan-Willem van de Meent, editors, International Conference on Artificial Intelligence and Statistics, 25-27 April 2023, Palau de Congressos, Valencia, Spain. Volume 206 of Proceedings of Machine Learning Research, pages 11149-11171, PMLR, 2023. [doi]

Authors

Kush Bhatia

This author has not been identified. Look up 'Kush Bhatia' in Google

Wenshuo Guo

This author has not been identified. Look up 'Wenshuo Guo' in Google

Jacob Steinhardt

This author has not been identified. Look up 'Jacob Steinhardt' in Google