ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research

Junyong Lin, Lu Dai 0001, Ruiqian Han, Yijie Sui, Ruilin Wang, Xingliang Sun, Qinglin Wu, Min Feng, Hao Liu 0026, Hui Xiong 0001. ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research. In Luiza Antonie, Jian Pei 0001, Xiaohui Yu 0001, Flavio Chierichetti, Hady W. Lauw, Yizhou Sun, Srinivasan Parthasarathy 0001, editors, Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, KDD 2025, Toronto ON, Canada, August 3-7, 2025. pages 5619-5630, ACM, 2025. [doi]

@inproceedings{Lin0HSWSWF0025,
  title = {ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research},
  author = {Junyong Lin and Lu Dai 0001 and Ruiqian Han and Yijie Sui and Ruilin Wang and Xingliang Sun and Qinglin Wu and Min Feng and Hao Liu 0026 and Hui Xiong 0001},
  year = {2025},
  doi = {10.1145/3711896.3737432},
  url = {https://doi.org/10.1145/3711896.3737432},
  researchr = {https://researchr.org/publication/Lin0HSWSWF0025},
  cites = {0},
  citedby = {0},
  pages = {5619-5630},
  booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.2, KDD 2025, Toronto ON, Canada, August 3-7, 2025},
  editor = {Luiza Antonie and Jian Pei 0001 and Xiaohui Yu 0001 and Flavio Chierichetti and Hady W. Lauw and Yizhou Sun and Srinivasan Parthasarathy 0001},
  publisher = {ACM},
  isbn = {979-8-4007-1454-2},
}