Maik Fröbe, Janek Bevendorff, Lukas Gienapp, Michael Völske, Benno Stein 0001, Martin Potthast, Matthias Hagen. CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl. In Fernando Diaz 0001, Chirag Shah, Torsten Suel, Pablo Castells, Rosie Jones, Tetsuya Sakai, editors, SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021. pages 2398-2404, ACM, 2021. [doi]
Abstract is missing.