The following publications are possibly variants of this publication:
- Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue SystemsPei-hao Su, David Vandyke, Milica Gasic, Nikola Mrksic, Tsung-Hsien Wen, Steve J. Young. sigdial 2015: 417-421 [doi]
- On-line policy optimisation of Bayesian spoken dialogue systems via human interactionMilica Gasic, Catherine Breslin, Matthew Henderson, DongHo Kim, Martin Szummer, Blaise Thomson, Pirros Tsiakoulis, Steve Young. icassp 2013: 8367-8371 [doi]
- Reinforcement learning and reward estimation for dialogue policy optimisationPei-hao Su. PhD thesis, University of Cambridge, UK, 2018. [doi]
- On-line policy optimisation of spoken dialogue systems via live interaction with human subjectsMilica Gasic, Filip Jurcícek, Blaise Thomson, Kai Yu, Steve Young. asru 2011: 312-317 [doi]
- Reward estimation for dialogue policy optimisationPei-hao Su, Milica Gasic, Steve J. Young. csl, 51:24-43, 2018. [doi]
- Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue SystemsFlorian Kreyssig, Iñigo Casanueva, Pawel Budzianowski, Milica Gasic. sigdial 2018: 60-69 [doi]