An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions

Yao Ma, Tingting Zhao, Kohei Hatano, Masashi Sugiyama. An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions. In Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo, editors, Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II. Volume 8725 of Lecture Notes in Computer Science, pages 354-369, Springer, 2014. [doi]

@inproceedings{MaZHS14,
  title = {An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions},
  author = {Yao Ma and Tingting Zhao and Kohei Hatano and Masashi Sugiyama},
  year = {2014},
  doi = {10.1007/978-3-662-44851-9_23},
  url = {http://dx.doi.org/10.1007/978-3-662-44851-9_23},
  researchr = {https://researchr.org/publication/MaZHS14},
  cites = {0},
  citedby = {0},
  pages = {354-369},
  booktitle = {Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part II},
  editor = {Toon Calders and Floriana Esposito and Eyke Hüllermeier and Rosa Meo},
  volume = {8725},
  series = {Lecture Notes in Computer Science},
  publisher = {Springer},
  isbn = {978-3-662-44850-2},
}