Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

Tal Lancewicki, Aviv Rosenberg 0002, Dmitry Sotnikov. Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback. In Andreas Krause 0001, Emma Brunskill, KyungHyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA. Volume 202 of Proceedings of Machine Learning Research, pages 18482-18534, PMLR, 2023. [doi]

@inproceedings{Lancewicki0S23,
  title = {Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback},
  author = {Tal Lancewicki and Aviv Rosenberg 0002 and Dmitry Sotnikov},
  year = {2023},
  url = {https://proceedings.mlr.press/v202/lancewicki23a.html},
  researchr = {https://researchr.org/publication/Lancewicki0S23},
  cites = {0},
  citedby = {0},
  pages = {18482-18534},
  booktitle = {International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA},
  editor = {Andreas Krause 0001 and Emma Brunskill and KyungHyun Cho and Barbara Engelhardt and Sivan Sabato and Jonathan Scarlett},
  volume = {202},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
}