How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?

Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzmán. How Robust is Neural Machine Translation to Language Imbalance in Multilingual Tokenizer Training?. In Kevin Duh, Francisco Guzmán, Stephen Richardson, editors, Proceedings of the 15th biennial conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), AMTA 2022, Orlando, USA, September 12-16, 2022. pages 97-116, Association for Machine Translation in the Americas, 2022. [doi]

Authors

Shiyue Zhang

This author has not been identified. Look up 'Shiyue Zhang' in Google

Vishrav Chaudhary

This author has not been identified. Look up 'Vishrav Chaudhary' in Google

Naman Goyal

This author has not been identified. Look up 'Naman Goyal' in Google

James Cross

This author has not been identified. Look up 'James Cross' in Google

Guillaume Wenzek

This author has not been identified. Look up 'Guillaume Wenzek' in Google

Mohit Bansal

This author has not been identified. Look up 'Mohit Bansal' in Google

Francisco Guzmán

This author has not been identified. Look up 'Francisco Guzmán' in Google