V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin A. Riedmiller, Matthew M. Botvinick. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. [doi]

Authors

H. Francis Song

This author has not been identified. Look up 'H. Francis Song' in Google

Abbas Abdolmaleki

This author has not been identified. Look up 'Abbas Abdolmaleki' in Google

Jost Tobias Springenberg

This author has not been identified. Look up 'Jost Tobias Springenberg' in Google

Aidan Clark

This author has not been identified. Look up 'Aidan Clark' in Google

Hubert Soyer

This author has not been identified. Look up 'Hubert Soyer' in Google

Jack W. Rae

This author has not been identified. Look up 'Jack W. Rae' in Google

Seb Noury

This author has not been identified. Look up 'Seb Noury' in Google

Arun Ahuja

This author has not been identified. Look up 'Arun Ahuja' in Google

Siqi Liu

This author has not been identified. Look up 'Siqi Liu' in Google

Dhruva Tirumala

This author has not been identified. Look up 'Dhruva Tirumala' in Google

Nicolas Heess

This author has not been identified. Look up 'Nicolas Heess' in Google

Dan Belov

This author has not been identified. Look up 'Dan Belov' in Google

Martin A. Riedmiller

This author has not been identified. Look up 'Martin A. Riedmiller' in Google

Matthew M. Botvinick

This author has not been identified. Look up 'Matthew M. Botvinick' in Google