A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs

Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Görür, Chris Harris, Dale Schuurmans. A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs. In Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. 2020. [doi]

Authors

Nevena Lazic

This author has not been identified. Look up 'Nevena Lazic' in Google

Dong Yin

This author has not been identified. Look up 'Dong Yin' in Google

Mehrdad Farajtabar

This author has not been identified. Look up 'Mehrdad Farajtabar' in Google

Nir Levine

This author has not been identified. Look up 'Nir Levine' in Google

Dilan Görür

This author has not been identified. Look up 'Dilan Görür' in Google

Chris Harris

This author has not been identified. Look up 'Chris Harris' in Google

Dale Schuurmans

This author has not been identified. Look up 'Dale Schuurmans' in Google