The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks

Kaiser Sun, Adina Williams, Dieuwke Hupkes. The Validity of Evaluation Results: Assessing Concurrence Across Compositionality Benchmarks. In Jing Jiang 0001, David Reitter, Shumin Deng, editors, Proceedings of the 27th Conference on Computational Natural Language Learning, CoNLL 2023, Singapore, December 6-7, 2023. pages 274-293, Association for Computational Linguistics, 2023. [doi]

Abstract

Abstract is missing.