Assessing the Role of Data Quality in Training Bilingual Language Models

Skyler Seto, Maartje ter Hoeve, Maureen de Seyssel, David Grangier. Assessing the Role of Data Quality in Training Bilingual Language Models. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 22694-22720, Association for Computational Linguistics, 2025. [doi]

Authors

Skyler Seto

This author has not been identified. Look up 'Skyler Seto' in Google

Maartje ter Hoeve

This author has not been identified. Look up 'Maartje ter Hoeve' in Google

Maureen de Seyssel

This author has not been identified. Look up 'Maureen de Seyssel' in Google

David Grangier

This author has not been identified. Look up 'David Grangier' in Google