CorpusNÓS: A massive Galician corpus for training large language models

Iria de-Dios-Flores, Silvia Paniagua Suárez, Cristina Carbajal-Pérez, Daniel Bardanca Outeiriño, Marcos García 0001, Pablo Gamallo 0001. CorpusNÓS: A massive Galician corpus for training large language models. In Pablo Gamallo 0001, Daniela Claro, António Teixeira, Livy Real, Marcos García 0001, Hugo Gonçalo Oliveira, Raquel Amaro, editors, Proceedings of the 16th International Conference on Computational Processing of Portuguese, PROPOR 2024, Santiago de Compostela, Galicia/Spain, 12-15 March, 2024. pages 593-599, Association for Computational Lingustics, 2024. [doi]

Abstract

Abstract is missing.