Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation

Siyuan Wang, Zhuohan Long, Zhihao Fan, Xuanjing Huang 0001, Zhongyu Wei. Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation. In Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa 0001, Barbara Di Eugenio, Steven Schockaert, editors, Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025. pages 3310-3328, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.