Researchr is a web site for finding, collecting, sharing, and reviewing scientific publications, for researchers by researchers.
Sign up for an account to create a profile with publication list, tag and review your related work, and share bibliographies with your co-authors.
Binrui Wang, Zikai Wang, Yongping Du, Mingyang Li. Mixed-policy preference optimization with self-generated non-preferred responses and off-policy preference distillation. Neurocomputing, 695:133996, 2026. [doi]
Abstract is missing.