V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control

H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin A. Riedmiller, Matthew M. Botvinick. V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. [doi]

@inproceedings{SongASCSRNALTHB20,
  title = {V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control},
  author = {H. Francis Song and Abbas Abdolmaleki and Jost Tobias Springenberg and Aidan Clark and Hubert Soyer and Jack W. Rae and Seb Noury and Arun Ahuja and Siqi Liu and Dhruva Tirumala and Nicolas Heess and Dan Belov and Martin A. Riedmiller and Matthew M. Botvinick},
  year = {2020},
  url = {https://openreview.net/forum?id=SylOlp4FvH},
  researchr = {https://researchr.org/publication/SongASCSRNALTHB20},
  cites = {0},
  citedby = {0},
  booktitle = {8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020},
  publisher = {OpenReview.net},
}