Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, Sergey Levine. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Edward A. Fox, Roman Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada. pages 11761-11771, 2019. [doi]

Authors

Aviral Kumar

This author has not been identified. Look up 'Aviral Kumar' in Google

Justin Fu

This author has not been identified. Look up 'Justin Fu' in Google

Matthew Soh

This author has not been identified. Look up 'Matthew Soh' in Google

George Tucker

This author has not been identified. Look up 'George Tucker' in Google

Sergey Levine

This author has not been identified. Look up 'Sergey Levine' in Google