Kitten: a tool for normalizing HTML and extracting its textual content

Mathieu-Henri Falco, VĂ©ronique Moriceau, Anne Vilnat. Kitten: a tool for normalizing HTML and extracting its textual content. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, editors, Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, May 23-25, 2012. pages 2261-2267, European Language Resources Association (ELRA), 2012. [doi]

Abstract

Abstract is missing.