Challenges in Trustworthy Human Evaluation of Chatbots

Wenting Zhao, Alexander M. Rush, Tanya Goyal. Challenges in Trustworthy Human Evaluation of Chatbots. In Luis Chiruzzo, Alan Ritter, Lu Wang, editors, Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025. pages 3359-3365, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.