Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation

Juno Kim, Denny Wu, Jason D. Lee, Taiji Suzuki. Metastable Dynamics of Chain-of-Thought Reasoning: Provable Benefits of Search, RL and Distillation. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025. OpenReview.net, 2025. [doi]

Authors

Juno Kim

This author has not been identified. Look up 'Juno Kim' in Google

Denny Wu

This author has not been identified. Look up 'Denny Wu' in Google

Jason D. Lee

This author has not been identified. Look up 'Jason D. Lee' in Google

Taiji Suzuki

This author has not been identified. Look up 'Taiji Suzuki' in Google