Disentangling Length from Quality in Direct Preference Optimization

Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn. Disentangling Length from Quality in Direct Preference Optimization. In Lun-Wei Ku, Andre Martins, Vivek Srikumar, editors, Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024. pages 4998-5017, Association for Computational Linguistics, 2024. [doi]

Authors

Ryan Park

This author has not been identified. Look up 'Ryan Park' in Google

Rafael Rafailov

This author has not been identified. Look up 'Rafael Rafailov' in Google

Stefano Ermon

This author has not been identified. Look up 'Stefano Ermon' in Google

Chelsea Finn

This author has not been identified. Look up 'Chelsea Finn' in Google