Wenting Zhao, Alexander M. Rush, Tanya Goyal. Challenges in Trustworthy Human Evaluation of Chatbots. In Luis Chiruzzo, Alan Ritter, Lu Wang, editors, Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025. pages 3359-3365, Association for Computational Linguistics, 2025. [doi]
Abstract is missing.