Sample Policy Gradient: A Competitive Policy Optimisation Method for Off-Policy Reinforcement Learning

Athanasios Trantas. Sample Policy Gradient: A Competitive Policy Optimisation Method for Off-Policy Reinforcement Learning. In Ana Paula Rocha 0001, Mattias Wahde, H. Jaap van den Herik, editors, Proceedings of the 18th International Conference on Agents and Artificial Intelligence, ICAART 2026 - Volume 4, Marbella, Spain, March 5-7, 2026. pages 3627-3635, SCITEPRESS, 2026. [doi]

Abstract

Abstract is missing.