How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán. How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?. In Kevin Duh, Francisco Guzmán, Stephen Richardson, editors, Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), AMTA 2022, Orlando, USA, September 12-16, 2022. pages 97-116, Association for Machine Translation in the Americas, 2022. [doi]

Abstract

Abstract is missing.