Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages

Idris Abdulmumin, Michael Beukman, Jesujoba O. Alabi, Chris Chinenye Emezue, Everlyn Chimoto, Tosin P. Adewumi, Shamsuddeen Hassan Muhammad, Mofetoluwa Adeyemi, Oreen Yousuf, Sahib Singh, Tajuddeen Gwadabe. Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages. In Philipp Koehn, Loïc Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-Jussà, Christian Federmann, Mark Fishel, Alexander Fraser 0001, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno-Yepes, Tom Kocmi, André Martins, Makoto Morishita, Christof Monz, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Aurélie Névéol, Mariana Neves 0002, Martin Popel, Marco Turchi, Marcos Zampieri, editors, Proceedings of the Seventh Conference on Machine Translation, WMT 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 7-8, 2022. pages 1001-1014, Association for Computational Linguistics, 2022. [doi]

@inproceedings{AbdulmuminBAECAMAYSG22,
  title = {Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages},
  author = {Idris Abdulmumin and Michael Beukman and Jesujoba O. Alabi and Chris Chinenye Emezue and Everlyn Chimoto and Tosin P. Adewumi and Shamsuddeen Hassan Muhammad and Mofetoluwa Adeyemi and Oreen Yousuf and Sahib Singh and Tajuddeen Gwadabe},
  year = {2022},
  url = {https://aclanthology.org/2022.wmt-1.98},
  researchr = {https://researchr.org/publication/AbdulmuminBAECAMAYSG22},
  cites = {0},
  citedby = {0},
  pages = {1001-1014},
  booktitle = {Proceedings of the Seventh Conference on Machine Translation, WMT 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 7-8, 2022},
  editor = {Philipp Koehn and Loïc Barrault and Ondrej Bojar and Fethi Bougares and Rajen Chatterjee and Marta R. Costa-Jussà and Christian Federmann and Mark Fishel and Alexander Fraser 0001 and Markus Freitag and Yvette Graham and Roman Grundkiewicz and Paco Guzman and Barry Haddow and Matthias Huck and Antonio Jimeno-Yepes and Tom Kocmi and André Martins and Makoto Morishita and Christof Monz and Masaaki Nagata and Toshiaki Nakazawa and Matteo Negri and Aurélie Névéol and Mariana Neves 0002 and Martin Popel and Marco Turchi and Marcos Zampieri},
  publisher = {Association for Computational Linguistics},
  isbn = {978-1-959429-29-6},
}