The following publications are possibly variants of this publication:
- MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language ModelsYan Cai 0001, Linlin Wang, Ye Wang, Gerard de Melo, Ya Zhang, Yanfeng Wang, Liang He. AAAI 2024: 17709-17717 [doi]
- Benchmarking Large Language Models on CMExam - A comprehensive Chinese Medical Exam DatasetJunling Liu, Peilin Zhou, Yining Hua, Dading Chong, Zhongyu Tian, Andrew Liu, Helin Wang, Chenyu You, Zhenhua Guo, Lei Zhu, Michael Lingzhi Li. nips 2023: [doi]
- LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language ModelsHaitao Li 0006, You Chen, Qingyao Ai, Yueyue Wu, Ruizhe Zhang 0005, Yiqun Liu 0001. nips 2024: [doi]
- WenMind: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Classical Literature and Language ArtsJiahuan Cao, Yang Liu, Yongxin Shi, Kai Ding 0009, Lianwen Jin. nips 2024: [doi]