Reward-guided direct preference optimization

Zhe Ding, Su Pan, Yongpan Zhang, Hui Ji, Cheng Ding. Reward-guided direct preference optimization. Expert Syst. Appl., 299:130295, 2026. [doi]

Abstract

Abstract is missing.