Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs - researchr publication

researchr

You are not signed in
Sign in
Sign up

Naman Agarwal, Syomantak Chaudhuri, Prateek Jain 0002, Dheeraj Mysore Nagaraj, Praneeth Netrapalli. Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. [doi]

Abstract is missing.

runs on WebDSL