Consolidation of References to Persons in Bibliographic Databases

Nuno Freire, José Luis Borbinha, Bruno Martins. Consolidation of References to Persons in Bibliographic Databases. In George Buchanan, Masood Masoodian, Sally Jo Cunningham, editors, Digital Libraries: Universal and Ubiquitous Access to Information, 11th International Conference on Asian Digital Libraries, ICADL 2008, Bali, Indonesia, December 2-5, 2008. Proceedings. Volume 5362 of Lecture Notes in Computer Science, pages 256-265, Springer, 2008. [doi]

Abstract

Entity resolution is the process of determining if, in a specific context, two or more references correspond to the same entity. In this work, we address this problem in the context of references to persons as they are found in bibliographic data, specifically in the case of consolidating multiple datasets. Or solution follows the extraction, transformation and loading (ETL) process, typical in data warehouses. It computes the similarities of the attribute values for the references, and employs a decision tree to decide when the references match. We describe the characteristics of these references within bibliographic datasets, and how we explored those characteristics by developing new similarity metrics to improve the quality of the consolidation process. We evaluated our work by designing an experiment with data from four national libraries. The results show that the proposed similarity metrics contribute significantly to the consolidation process.