A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models

Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Jónsson, Vilhjalmur Thorsteinsson, Hafsteinn Einarsson. A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis, editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022. pages 4356-4366, European Language Resources Association, 2022. [doi]

Authors

Vésteinn Snæbjarnarson

This author has not been identified. Look up 'Vésteinn Snæbjarnarson' in Google

Haukur Barri Símonarson

This author has not been identified. Look up 'Haukur Barri Símonarson' in Google

Pétur Orri Ragnarsson

This author has not been identified. Look up 'Pétur Orri Ragnarsson' in Google

Svanhvít Lilja Ingólfsdóttir

This author has not been identified. Look up 'Svanhvít Lilja Ingólfsdóttir' in Google

Haukur Jónsson

This author has not been identified. Look up 'Haukur Jónsson' in Google

Vilhjalmur Thorsteinsson

This author has not been identified. Look up 'Vilhjalmur Thorsteinsson' in Google

Hafsteinn Einarsson

This author has not been identified. Look up 'Hafsteinn Einarsson' in Google