Softmax policy gradient methods can take exponential time to converge

Gen Li 0005, Yuting Wei, Yuejie Chi, Yuxin Chen 0002. Softmax policy gradient methods can take exponential time to converge. Math. Program., 201(1):707-802, 2023. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.