From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu 0002, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica. From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025. OpenReview.net, 2025. [doi]

Authors

Tianle Li

This author has not been identified. Look up 'Tianle Li' in Google

Wei-Lin Chiang

This author has not been identified. Look up 'Wei-Lin Chiang' in Google

Evan Frick

This author has not been identified. Look up 'Evan Frick' in Google

Lisa Dunlap

This author has not been identified. Look up 'Lisa Dunlap' in Google

Tianhao Wu 0002

This author has not been identified. Look up 'Tianhao Wu 0002' in Google

Banghua Zhu

This author has not been identified. Look up 'Banghua Zhu' in Google

Joseph E. Gonzalez

This author has not been identified. Look up 'Joseph E. Gonzalez' in Google

Ion Stoica

This author has not been identified. Look up 'Ion Stoica' in Google