How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

researchr

You are not signed in
Sign in
Sign up

Phillip Rust, Jonas Pfeiffer, Ivan Vulic, Sebastian Ruder, Iryna Gurevych. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models. In Chengqing Zong, Fei Xia, Wenjie Li 0002, Roberto Navigli, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021. pages 3118-3135, Association for Computational Linguistics, 2021. [doi]

@inproceedings{RustPVRG20,
  title = {How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models},
  author = {Phillip Rust and Jonas Pfeiffer and Ivan Vulic and Sebastian Ruder and Iryna Gurevych},
  year = {2021},
  url = {https://aclanthology.org/2021.acl-long.243},
  researchr = {https://researchr.org/publication/RustPVRG20},
  cites = {0},
  citedby = {0},
  pages = {3118-3135},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021},
  editor = {Chengqing Zong and Fei Xia and Wenjie Li 0002 and Roberto Navigli},
  publisher = {Association for Computational Linguistics},
  isbn = {978-1-954085-52-7},
}

External Links

Cite Key

Statistics

PDF

Researchr

How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models