Skyler Seto, Maartje ter Hoeve, Maureen de Seyssel, David Grangier. Assessing the Role of Data Quality in Training Bilingual Language Models. In Christos Christodoulopoulos 0001, Tanmoy Chakraborty 0002, Carolyn Rose, Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, Suzhou, China, November 4-9, 2025. pages 22694-22720, Association for Computational Linguistics, 2025. [doi]
Abstract is missing.