CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs

Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzmán, Philipp Koehn. CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs. In Bonnie Webber, Trevor Cohn, Yulan He, Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020. pages 5960-5969, Association for Computational Linguistics, 2020. [doi]

@inproceedings{El-KishkyCGK20,
  title = {CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs},
  author = {Ahmed El-Kishky and Vishrav Chaudhary and Francisco Guzmán and Philipp Koehn},
  year = {2020},
  url = {https://www.aclweb.org/anthology/2020.emnlp-main.480/},
  researchr = {https://researchr.org/publication/El-KishkyCGK20},
  cites = {0},
  citedby = {0},
  pages = {5960-5969},
  booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020},
  editor = {Bonnie Webber and Trevor Cohn and Yulan He and Yang Liu},
  publisher = {Association for Computational Linguistics},
  isbn = {978-1-952148-60-6},
}