A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models

Vésteinn Snæbjarnarson, Haukur Barri Símonarson, Pétur Orri Ragnarsson, Svanhvít Lilja Ingólfsdóttir, Haukur Jónsson, Vilhjalmur Thorsteinsson, Hafsteinn Einarsson. A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis, editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022. pages 4356-4366, European Language Resources Association, 2022. [doi]

@inproceedings{SnaebjarnarsonS22,
  title = {A Warm Start and a Clean Crawled Corpus - A Recipe for Good Language Models},
  author = {Vésteinn Snæbjarnarson and Haukur Barri Símonarson and Pétur Orri Ragnarsson and Svanhvít Lilja Ingólfsdóttir and Haukur Jónsson and Vilhjalmur Thorsteinsson and Hafsteinn Einarsson},
  year = {2022},
  url = {https://aclanthology.org/2022.lrec-1.464},
  researchr = {https://researchr.org/publication/SnaebjarnarsonS22},
  cites = {0},
  citedby = {0},
  pages = {4356-4366},
  booktitle = {Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022},
  editor = {Nicoletta Calzolari and Frédéric Béchet and Philippe Blache and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association},
}