Dialog policy optimization for low resource setting using Self-play and Reward based Sampling

Tharindu Madusanka, Durashi Langappuli, Thisara Welmilla, Uthayasanker Thayasivam, Sanath Jayasena. Dialog policy optimization for low resource setting using Self-play and Reward based Sampling. In Minh Le Nguyen, Mai Chi Luong, Sanghoun Song, editors, Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, PACLIC 2020, Hanoi, Vietnam, October 24-26, 2020. pages 178-187, Association for Computational Linguistics, 2020. [doi]

@inproceedings{MadusankaLWTJ20,
  title = {Dialog policy optimization for low resource setting using Self-play and Reward based Sampling},
  author = {Tharindu Madusanka and Durashi Langappuli and Thisara Welmilla and Uthayasanker Thayasivam and Sanath Jayasena},
  year = {2020},
  url = {https://www.aclweb.org/anthology/2020.paclic-1.21/},
  researchr = {https://researchr.org/publication/MadusankaLWTJ20},
  cites = {0},
  citedby = {0},
  pages = {178-187},
  booktitle = {Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, PACLIC 2020, Hanoi, Vietnam, October 24-26, 2020},
  editor = {Minh Le Nguyen and Mai Chi Luong and Sanghoun Song},
  publisher = {Association for Computational Linguistics},
}