Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

Olivier Pietquin, Matthieu Geist, Senthilkumar Chandramohan. Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences. In Toby Walsh, editor, IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011. pages 1878-1883, IJCAI/AAAI, 2011. [doi]

@inproceedings{PietquinGC11,
  title = {Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences},
  author = {Olivier Pietquin and Matthieu Geist and Senthilkumar Chandramohan},
  year = {2011},
  url = {http://ijcai.org/papers11/Papers/IJCAI11-314.pdf},
  researchr = {https://researchr.org/publication/PietquinGC11},
  cites = {0},
  citedby = {0},
  pages = {1878-1883},
  booktitle = {IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011},
  editor = {Toby Walsh},
  publisher = {IJCAI/AAAI},
  isbn = {978-1-57735-516-8},
}