ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

Dan Zhang, Sining Zhoubian, Ziniu Hu, Yisong Yue, Yuxiao Dong, Jie Tang 0001. ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, Cheng Zhang 0005, editors, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024. 2024. [doi]

Authors

Dan Zhang

This author has not been identified. Look up 'Dan Zhang' in Google

Sining Zhoubian

This author has not been identified. Look up 'Sining Zhoubian' in Google

Ziniu Hu

This author has not been identified. Look up 'Ziniu Hu' in Google

Yisong Yue

This author has not been identified. Look up 'Yisong Yue' in Google

Yuxiao Dong

This author has not been identified. Look up 'Yuxiao Dong' in Google

Jie Tang 0001

This author has not been identified. Look up 'Jie Tang 0001' in Google