PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

Baijiong Lin, Weisen Jiang, Yuancheng Xu, Hao Chen, Ying-Cong Chen. PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025. OpenReview.net, 2025. [doi]

Authors

Baijiong Lin

This author has not been identified. Look up 'Baijiong Lin' in Google

Weisen Jiang

This author has not been identified. Look up 'Weisen Jiang' in Google

Yuancheng Xu

This author has not been identified. Look up 'Yuancheng Xu' in Google

Hao Chen

This author has not been identified. It may be one of the following persons: Look up 'Hao Chen' in Google

Ying-Cong Chen

This author has not been identified. Look up 'Ying-Cong Chen' in Google