Bootstrapping Language Models with DPO Implicit Rewards

Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu 0012, Arunesh Sinha, Pradeep Varakantham, Min Lin. Bootstrapping Language Models with DPO Implicit Rewards. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net, 2025. [doi]

Abstract

Abstract is missing.