Dialog policy optimization for low resource setting using Self-play and Reward based Sampling

Tharindu Madusanka, Durashi Langappuli, Thisara Welmilla, Uthayasanker Thayasivam, Sanath Jayasena. Dialog policy optimization for low resource setting using Self-play and Reward based Sampling. In Minh Le Nguyen, Mai Chi Luong, Sanghoun Song, editors, Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, PACLIC 2020, Hanoi, Vietnam, October 24-26, 2020. pages 178-187, Association for Computational Linguistics, 2020. [doi]

Abstract

Abstract is missing.