Finding Replicable Human Evaluations via Stable Ranking Probability

Parker Riley, Daniel Deutsch, George F. Foster, Viresh Ratnakar, Ali Dabirmoghaddam, Markus Freitag. Finding Replicable Human Evaluations via Stable Ranking Probability. In Kevin Duh, Helena Gómez-Adorno, Steven Bethard, editors, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024, Mexico City, Mexico, June 16-21, 2024. pages 4908-4919, Association for Computational Linguistics, 2024. [doi]

Abstract

Abstract is missing.