From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu 0002, Banghua Zhu, Joseph E. Gonzalez, Ion Stoica. From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025. OpenReview.net, 2025. [doi]

@inproceedings{LiCFD0ZGS25,
  title = {From Crowdsourced Data to High-quality Benchmarks: Arena-Hard and Benchbuilder Pipeline},
  author = {Tianle Li and Wei-Lin Chiang and Evan Frick and Lisa Dunlap and Tianhao Wu 0002 and Banghua Zhu and Joseph E. Gonzalez and Ion Stoica},
  year = {2025},
  url = {https://openreview.net/forum?id=KfTf9vFvSn},
  researchr = {https://researchr.org/publication/LiCFD0ZGS25},
  cites = {0},
  citedby = {0},
  booktitle = {Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025},
  publisher = {OpenReview.net},
}