The goal of the semantic web is to construct a web of semantically annotated information that allows machines to find, combine, and process information. The vision for the semantic web requires a ontologies defining relevant terms and their relations. While the semantic web has been making only slow progress, another form of annotation has taken over the (social) web with sites such as flickr. Tagging is a light-weight form of semantic annotation in which simple terms (keywords) are associated with an artifact. Tags are chosen on the fly by users and don't have to follow a pre-defined ontology. As a result, tag systems (folksonomies) can develop quickly with emerging terminologies. Furthermore, tags do not enforce a Linean hierarchy on terms. On the other hand, tagging systems suffer from ambiguity, synonyms, and classification level mismatches. Despite these problems, tagging systems seem to be very successful on the social web. (Are there studies that indicate that tagging systems do indeed help users find stuff more effectively?)
This analysis is the introduction for the paper about TagFusion, which notes that a big downside of the tagging collections of popular social media sites is that they don't provide the interoperability that is envisioned for the (semantic) web. That is, tags assigned to artifacts on one site can not be linked to tags on other sites. The paper proposes to address this problem by means of a meta-tagging facility that collects tags for various sites and links them, allowing various forms of data mining to be applied.
While the idea is worthy, it is not clear whether TagFusion will help. First, it seems to be an idea rather than a working system; the project page does not link to an implementation. But more importantly, as the paper cites Tim Berners Lee, the web is more of a social construction than a technological one. The architecture of TagFusion seems to require sites with tag collections to publish tags to the TagFusion system, depending on collaboration (e.g. by providing webhooks) for such an enterprise to work. To achieve better interoperability it would be useful if more sites provide an API that allows arbitrary sites to hook into their content. (It is not yet clear to me what such an API would look like, though. But I am thinking about an API for various aspects of researchr, including tagging; suggestions are welcome.)
The paper describes the recommendation algorithm used by Amazon.com to recommend other items based on items in shopping cart. The paper first reviews collaborative filtering, cluster models, and search-based methods to compute recommendations and discusses why these either do not scale to Amazon.com size catalogs or do not provide good recommendations. The paper then presents the item-to-item collaborative filtering method used by Amazon. An offline algorithm computes the similarity of items (instead of customers), by finding items purchased by the same customers. Based on this similarity relation, an online algorithm finds items that are similar to items that a customer is purchasing and orders them by rating and/or relevance.
A useful and clear introduction to recommendation algorithms; but not directly (literally) applicable to recommendations in digital libraries.