Dialog policy optimization for low resource setting using Self-play and Reward based Sampling

Tharindu Madusanka, Durashi Langappuli, Thisara Welmilla, Uthayasanker Thayasivam, Sanath Jayasena. Dialog policy optimization for low resource setting using Self-play and Reward based Sampling. In Minh Le Nguyen, Mai Chi Luong, Sanghoun Song, editors, Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation, PACLIC 2020, Hanoi, Vietnam, October 24-26, 2020. pages 178-187, Association for Computational Linguistics, 2020. [doi]

Authors

Tharindu Madusanka

This author has not been identified. Look up 'Tharindu Madusanka' in Google

Durashi Langappuli

This author has not been identified. Look up 'Durashi Langappuli' in Google

Thisara Welmilla

This author has not been identified. Look up 'Thisara Welmilla' in Google

Uthayasanker Thayasivam

This author has not been identified. Look up 'Uthayasanker Thayasivam' in Google

Sanath Jayasena

This author has not been identified. Look up 'Sanath Jayasena' in Google