SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

David Ifeoluwa Adelani, Hannah Liu, Xiaoyu Shen 0001, Nikita Vassilyev, Jesujoba O. Alabi, Yanke Mao, Haonan Gao, En-Shiun Annie Lee. SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects. In Yvette Graham, Matthew Purver, editors, Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Volume 1: Long Papers, St. Julian's, Malta, March 17-22, 2024. pages 226-245, Association for Computational Linguistics, 2024. [doi]

Abstract

Abstract is missing.