Provably Robust DPO: Aligning Language Models with Noisy Feedback - researchr publication authors

researchr

You are not signed in
Sign in
Sign up

Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan. Provably Robust DPO: Aligning Language Models with Noisy Feedback. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024. [doi]

This author has not been identified. Look up 'Sayak Ray Chowdhury' in GoogleThis author has not been identified. Look up 'Anush Kini' in GoogleThis author has not been identified. Look up 'Nagarajan Natarajan' in Google

runs on WebDSL