How Important Is Tokenization in French Medical Masked Language Models?

Yanis Labrak, Adrien Bazoge, Béatrice Daille, Mickael Rouvier, Richard Dufour. How Important Is Tokenization in French Medical Masked Language Models?. In Nicoletta Calzolari, Min-Yen Kan, Véronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy. pages 8223-8234, ELRA and ICCL, 2024. [doi]

Abstract

Abstract is missing.