Group-Level Data Selection for Efficient Pretraining

Zichun Yu, Fei Peng, Jie Lei, Arnold Overwijk, Scott Yih, Chenyan Xiong. Group-Level Data Selection for Efficient Pretraining. In Danielle Belgrave, Cheng Zhang 0005, Laura N. Montoya, Hsuan-Tien Lin, Razvan Pascanu, Piotr Koniusz, Marzyeh Ghassemi, Nancy Chen, Iván Vladimir Meza Ruíz, Arturo Loaiza-Bonilla, editors, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2025, NeurIPS 2025, San Diago, CA, USA, December 2-7, 2025 / Mexico City, Mexico, November 30 - December 5, 2025. 2025. [doi]

Authors

Zichun Yu

This author has not been identified. Look up 'Zichun Yu' in Google

Fei Peng

This author has not been identified. Look up 'Fei Peng' in Google

Jie Lei

This author has not been identified. Look up 'Jie Lei' in Google

Arnold Overwijk

This author has not been identified. Look up 'Arnold Overwijk' in Google

Scott Yih

This author has not been identified. Look up 'Scott Yih' in Google

Chenyan Xiong

This author has not been identified. Look up 'Chenyan Xiong' in Google