Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation

D. Sculley, William Cukierski, Phil Culliton, Sohier Dane, Maggie Demkin, Ryan Holbrook, Addison Howard, Paul Mooney, Walter Reade, Meg Risdal, Nate Keating. Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025 - Position Paper Track. OpenReview.net, 2025. [doi]

Abstract

Abstract is missing.