Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

Markus Holzleitner, Lukas Gruber, José Antonio Arjona-Medina, Johannes Brandstetter, Sepp Hochreiter. Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER. T. Large-Scale Data- and Knowledge-Centered Systems, 48:105-130, 2021. [doi]

@article{HolzleitnerGABH21,
  title = {Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER},
  author = {Markus Holzleitner and Lukas Gruber and José Antonio Arjona-Medina and Johannes Brandstetter and Sepp Hochreiter},
  year = {2021},
  doi = {10.1007/978-3-662-63519-3_5},
  url = {https://doi.org/10.1007/978-3-662-63519-3_5},
  researchr = {https://researchr.org/publication/HolzleitnerGABH21},
  cites = {0},
  citedby = {0},
  journal = {T. Large-Scale Data- and Knowledge-Centered Systems},
  volume = {48},
  pages = {105-130},
}