Bootstrapping Language Models with DPO Implicit Rewards

Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu 0012, Arunesh Sinha, Pradeep Varakantham, Min Lin. Bootstrapping Language Models with DPO Implicit Rewards. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Authors

Changyu Chen

This author has not been identified. Look up 'Changyu Chen' in Google

Zichen Liu

This author has not been identified. Look up 'Zichen Liu' in Google

Chao Du

This author has not been identified. Look up 'Chao Du' in Google

Tianyu Pang

This author has not been identified. Look up 'Tianyu Pang' in Google

Qian Liu 0012

This author has not been identified. Look up 'Qian Liu 0012' in Google

Arunesh Sinha

This author has not been identified. Look up 'Arunesh Sinha' in Google

Pradeep Varakantham

This author has not been identified. Look up 'Pradeep Varakantham' in Google

Min Lin

This author has not been identified. Look up 'Min Lin' in Google