Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping

Hao Sun, Lei Han, Rui Yang 0010, Xiaoteng Ma, Jian Guo, Bolei Zhou. Exploit Reward Shifting in Value-Based Deep-RL: Optimistic Curiosity-Based Exploration and Conservative Exploitation via Linear Reward Shaping. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022. 2022. [doi]

Authors

Hao Sun

This author has not been identified. Look up 'Hao Sun' in Google

Lei Han

This author has not been identified. Look up 'Lei Han' in Google

Rui Yang 0010

This author has not been identified. Look up 'Rui Yang 0010' in Google

Xiaoteng Ma

This author has not been identified. Look up 'Xiaoteng Ma' in Google

Jian Guo

This author has not been identified. Look up 'Jian Guo' in Google

Bolei Zhou

This author has not been identified. Look up 'Bolei Zhou' in Google