Smaller Can Be Better: Efficient Data Selection for Pre-training Models

Guang Fang, Shihui Wang, Mingxin Wang, Yulan Yang, Hao Huang. Smaller Can Be Better: Efficient Data Selection for Pre-training Models. In Wenjie Zhang 0001, Anthony K. H. Tung, Zhonglong Zheng, Zhengyi Yang 0001, Xiaoyang Wang 0002, Hongjie Guo, editors, Web and Big Data - 8th International Joint Conference, APWeb-WAIM 2024, Jinhua, China, August 30 - September 1, 2024, Proceedings, Part I. Volume 14961 of Lecture Notes in Computer Science, pages 327-342, Springer, 2024. [doi]

@inproceedings{FangWWYH24,
  title = {Smaller Can Be Better: Efficient Data Selection for Pre-training Models},
  author = {Guang Fang and Shihui Wang and Mingxin Wang and Yulan Yang and Hao Huang},
  year = {2024},
  doi = {10.1007/978-981-97-7232-2_22},
  url = {https://doi.org/10.1007/978-981-97-7232-2_22},
  researchr = {https://researchr.org/publication/FangWWYH24},
  cites = {0},
  citedby = {0},
  pages = {327-342},
  booktitle = {Web and Big Data - 8th International Joint Conference, APWeb-WAIM 2024, Jinhua, China, August 30 - September 1, 2024, Proceedings, Part I},
  editor = {Wenjie Zhang 0001 and Anthony K. H. Tung and Zhonglong Zheng and Zhengyi Yang 0001 and Xiaoyang Wang 0002 and Hongjie Guo},
  volume = {14961},
  series = {Lecture Notes in Computer Science},
  publisher = {Springer},
  isbn = {978-981-97-7232-2},
}