Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources

Adrien Barbaresi. Finding Viable Seed URLs for Web Corpora: A Scouting Approach and Comparative Study of Available Sources. In Felix Bildhauer, Roland Schäfer, editors, Proceedings of the 9th Web as Corpus Workshop, WaC@EACL 2014, Gothenburg, Sweden, April 26, 2014. pages 1-8, Association for Computational Linguistics, 2014. [doi]