Instance-level Randomization: Toward More Stable LLM Evaluations

Yiyang Li, Yonghuang Wu, Ying Luo, Liangtai Sun, Zishu Qin, Lin Qiu, Xuezhi Cao, Xunliang Cai. Instance-level Randomization: Toward More Stable LLM Evaluations. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 3411-3425, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.