Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5

Thao Anh Dang, Limor Raviv, Lukas Galke. Tokenization and Morphology in Multilingual Language Models: A Comparative Analysis of mT5 and ByT5. In Mourad Abbas, Tariq Yousef, Lukas Galke, editors, Proceedings of the 8th International Conference on Natural Language and Speech Processing, ICNLSP 2025, Southern Denmark University, Odense, Denmark, August 25-27, 2025. pages 242-257, Association for Computational Linguistics, 2025. [doi]

Abstract

Abstract is missing.