Smaller Can Be Better: Efficient Data Selection for Pre-training Models

Guang Fang, Shihui Wang, Mingxin Wang, Yulan Yang, Hao Huang. Smaller Can Be Better: Efficient Data Selection for Pre-training Models. In Wenjie Zhang 0001, Anthony K. H. Tung, Zhonglong Zheng, Zhengyi Yang 0001, Xiaoyang Wang 0002, Hongjie Guo, editors, Web and Big Data - 8th International Joint Conference, APWeb-WAIM 2024, Jinhua, China, August 30 - September 1, 2024, Proceedings, Part I. Volume 14961 of Lecture Notes in Computer Science, pages 327-342, Springer, 2024. [doi]

Abstract

Abstract is missing.