Evaluating the Performance of Large Language Models via Debates - researchr publication

researchr

You are not signed in
Sign in
Sign up

Behrad Moniri, Hamed Hassani, Edgar Dobriban. Evaluating the Performance of Large Language Models via Debates. In Luis Chiruzzo, Alan Ritter, Lu Wang, editors, Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29 - May 4, 2025. pages 2040-2075, Association for Computational Linguistics, 2025. [doi]

Abstract is missing.

runs on WebDSL