ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research

Junyong Lin, Lu Dai 0001, Ruiqian Han, Yijie Sui, Ruilin Wang, Xingliang Sun, Qinglin Wu, Min Feng, Hao Liu 0026, Hui Xiong 0001. ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research. In Luiza Antonie, Jian Pei 0001, Xiaohui Yu 0001, Flavio Chierichetti, Hady W. Lauw, Yizhou Sun, Srinivasan Parthasarathy 0001, editors, Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, KDD 2025, Toronto ON, Canada, August 3-7, 2025. pages 5619-5630, ACM, 2025. [doi]

Authors

Junyong Lin

This author has not been identified. Look up 'Junyong Lin' in Google

Lu Dai 0001

This author has not been identified. Look up 'Lu Dai 0001' in Google

Ruiqian Han

This author has not been identified. Look up 'Ruiqian Han' in Google

Yijie Sui

This author has not been identified. Look up 'Yijie Sui' in Google

Ruilin Wang

This author has not been identified. Look up 'Ruilin Wang' in Google

Xingliang Sun

This author has not been identified. Look up 'Xingliang Sun' in Google

Qinglin Wu

This author has not been identified. Look up 'Qinglin Wu' in Google

Min Feng

This author has not been identified. Look up 'Min Feng' in Google

Hao Liu 0026

This author has not been identified. Look up 'Hao Liu 0026' in Google

Hui Xiong 0001

This author has not been identified. Look up 'Hui Xiong 0001' in Google