Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin. Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, Cheng Zhang 0005, editors, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024. 2024. [doi]

Authors

Xuan Zhang

This author has not been identified. Look up 'Xuan Zhang' in Google

Chao Du

This author has not been identified. Look up 'Chao Du' in Google

Tianyu Pang

This author has not been identified. Look up 'Tianyu Pang' in Google

Qian Liu

This author has not been identified. Look up 'Qian Liu' in Google

Wei Gao

This author has not been identified. Look up 'Wei Gao' in Google

Min Lin

This author has not been identified. Look up 'Min Lin' in Google