Learning to Better Search with Language Models via Guided Reinforced Self-Training

Seungyong Moon, Bumsoo Park, Hyun Oh Song. Learning to Better Search with Language Models via Guided Reinforced Self-Training. In Danielle Belgrave, Cheng Zhang 0005, Laura N. Montoya, Hsuan-Tien Lin, Razvan Pascanu, Piotr Koniusz, Marzyeh Ghassemi, Nancy Chen, Iván Vladimir Meza Ruíz, Arturo Loaiza-Bonilla, editors, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, NeurIPS 2025, San Diago, CA, USA, December 2-7, 2025 / Mexico City, Mexico, November 30 - December 5, 2025. 2025. [doi]

Authors

Seungyong Moon

This author has not been identified. Look up 'Seungyong Moon' in Google

Bumsoo Park

This author has not been identified. Look up 'Bumsoo Park' in Google

Hyun Oh Song

This author has not been identified. Look up 'Hyun Oh Song' in Google