Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

Junshu Pan, Wei Shen, Shulin Huang, Qiji Zhou, Yue Zhang. Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model. In Sven Koenig, Chad Jenkins, Matthew E. Taylor, editors, Fortieth AAAI Conference on Artificial Intelligence, Thirty-Eighth Conference on Innovative Applications of Artificial Intelligence, Sixteenth Symposium on Educational Advances in Artificial Intelligence, AAAI 2026, Singapore, January 20-27, 2026. pages 32646-32654, AAAI Press, 2026. [doi]

Authors

Junshu Pan

This author has not been identified. Look up 'Junshu Pan' in Google

Wei Shen

This author has not been identified. Look up 'Wei Shen' in Google

Shulin Huang

This author has not been identified. Look up 'Shulin Huang' in Google

Qiji Zhou

This author has not been identified. Look up 'Qiji Zhou' in Google

Yue Zhang

This author has not been identified. Look up 'Yue Zhang' in Google