The following publications are possibly variants of this publication:
- Decomposed Deep Q-Network for Coherent Task-Oriented Dialogue Policy LearningYangyang Zhao, Kai Yin, Zhenyu Wang 0001, Mehdi Dastani, Shihan Wang 0001. taslp, 32:1380-1391, 2024. [doi]
- Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy EnvironmentsYangyang Zhao, Zhenyu Wang, Kai Yin, Rui Zhang 0046, Zhenhua Huang, Pei Wang. AAAI 2020: 9676-9684 [doi]