What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think

David M. Howcroft, Verena Rieser. What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021. pages 8932-8939, Association for Computational Linguistics, 2021. [doi]

Authors

David M. Howcroft

This author has not been identified. Look up 'David M. Howcroft' in Google

Verena Rieser

This author has not been identified. Look up 'Verena Rieser' in Google