Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation

D. Sculley, William Cukierski, Phil Culliton, Sohier Dane, Maggie Demkin, Ryan Holbrook, Addison Howard, Paul Mooney, Walter Reade, Meg Risdal, Nate Keating. Position: AI Competitions Provide the Gold Standard for Empirical Rigor in GenAI Evaluation. In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025 - Position Paper Track. OpenReview.net, 2025. [doi]

Authors

D. Sculley

This author has not been identified. Look up 'D. Sculley' in Google

William Cukierski

This author has not been identified. Look up 'William Cukierski' in Google

Phil Culliton

This author has not been identified. Look up 'Phil Culliton' in Google

Sohier Dane

This author has not been identified. Look up 'Sohier Dane' in Google

Maggie Demkin

This author has not been identified. Look up 'Maggie Demkin' in Google

Ryan Holbrook

This author has not been identified. Look up 'Ryan Holbrook' in Google

Addison Howard

This author has not been identified. Look up 'Addison Howard' in Google

Paul Mooney

This author has not been identified. Look up 'Paul Mooney' in Google

Walter Reade

This author has not been identified. Look up 'Walter Reade' in Google

Meg Risdal

This author has not been identified. Look up 'Meg Risdal' in Google

Nate Keating

This author has not been identified. Look up 'Nate Keating' in Google