Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About It

Seyed Mahed Mousavi, Edoardo Cecchinato, Lucia Hornikova, Giuseppe Riccardi. Garbage In, Reasoning Out? Why Benchmark Scores are Unreliable and What to Do About It. In Vera Demberg, Kentaro Inui, LluĂ­s Marquez, editors, Findings of the Association for Computational Linguistics: EACL 2026, Rabat, Morocco, March 24-29, 2026. pages 1747-1759, Association for Computational Linguistics, 2026. [doi]

Abstract

Abstract is missing.