How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

Phillip Rust, Jonas Pfeiffer, Ivan Vulic, Sebastian Ruder, Iryna Gurevych. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. In Chengqing Zong, Fei Xia, Wenjie Li 0002, Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. pages 3118-3135, Association for Computational Linguistics, 2021. [doi]

Authors

Phillip Rust

This author has not been identified. Look up 'Phillip Rust' in Google

Jonas Pfeiffer

This author has not been identified. Look up 'Jonas Pfeiffer' in Google

Ivan Vulic

This author has not been identified. Look up 'Ivan Vulic' in Google

Sebastian Ruder

This author has not been identified. Look up 'Sebastian Ruder' in Google

Iryna Gurevych

This author has not been identified. Look up 'Iryna Gurevych' in Google