Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpora

Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén. Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpora. In Marcos Zampieri, Preslav Nakov, Nikola Ljubesic, Jörg Tiedemann, Yves Scherrer, editors, Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial@COLING 2020, Barcelona, Spain (Online), December 13, 2020. pages 173-185, International Committee on Computational Linguistics (ICCL), 2020. [doi]

Abstract

Abstract is missing.