Proceedings of the Workshop on AI Evaluation Beyond Metrics co-located with the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022), Vienna, Austria, July 25th, 2022 - researchr publication

researchr

You are not signed in
Sign in
Sign up

José Hernández-Orallo, Lucy Cheke, Joshua Tenebaum, Tomer D. Ullman, Fernando Martínez-Plumed, Danaja Rutar, John Burden, Ryan Burnell, Wout Schellaert, editors, Proceedings of the Workshop on AI Evaluation Beyond Metrics co-located with the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022), Vienna, Austria, July 25th, 2022. Volume 3169 of CEUR Workshop Proceedings, CEUR-WS.org, 2022. [doi]

Conference: IJCAI2022

Abstract is missing.

Item Response Theory to Evaluate Speech Synthesis: Beyond Synthetic Speech DifficultyChaina Oliveira, Ricardo B. C. Prudêncio. [doi]

Robustness Testing of Machine Learning Families using Instance-Level IRT-DifficultyRaül Fabra-Boluda, Cèsar Ferri, Fernando Martínez-Plumed, María José Ramírez-Quintana. [doi]

Evaluating Object Permanence in Embodied Agents using the Animal-AI EnvironmentKonstantinos Voudouris, Niall Donnelly, Danaja Rutar, Ryan Burnell, John Burden, José Hernández-Orallo, Lucy Cheke. [doi]

FERM: A FEature-space Representation Measure for Improved Model EvaluationYeu-Shin Fu, Wenbo Ge, Jo Plested. [doi]

On Young Children's Exploration, Aha! Moments and Explanations in Model Building for Self-Regulated Problem-SolvingVicky Charisi, Natalia Díaz Rodríguez, Barbara Mawhin, Luis Merino. [doi]

Evaluating Sports Analytics Models: Challenges, Approaches, and Lessons LearnedJesse Davis, Lotte Bransen, Laurens Devos, Wannes Meert, Pieter Robberechts, Jan Van Haaren, Maaike Van Roy. [doi]

The Relevance of Non-Human Errors in Machine LearningRicardo Baeza-Yates, Marina Estévez-Almenzar. [doi]

Reject Before You Run: Small Assessors Anticipate Big Language ModelsLexin Zhou, Fernando Martínez-Plumed, José Hernández-Orallo, Cèsar Ferri, Wout Schellaert. [doi]

Evaluating Understanding on Conceptual Abstraction BenchmarksVictor Vikram Odouard, Melanie Mitchell. [doi]

A Framework for Categorising AI Evaluation InstrumentsAnthony G. Cohn, José Hernández-Orallo, Julius Sechang Mboli, Yael Moros-Daval, Zhiliang Xiang, Lexin Zhou. [doi]

runs on WebDSL