Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation

D. Sculley, William Cukierski, Phil Culliton, Sohier Dane, Maggie Demkin, Ryan Holbrook, Addison Howard, Paul Mooney, Walter Reade, Meg Risdal, Nate Keating. Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025 - Position Paper Track. OpenReview.net, 2025. [doi]

References

No references recorded for this publication.

Cited by

No citations of this publication recorded.