XEM: Managing the evolution of XML Documents

Hong Su, Diane Kramer, Li Chen, Kajal T. Claypool, Elke A. Rundensteiner. XEM: Managing the evolution of XML Documents. In Karl Aberer, Ling Liu, editors, Eleventh International Workshop on Research Issues in Data Engineering: Document Management for Data Intensive Business and Scientific Applications, Heidelberg, Germany, 1-2 April 2001. pages 103-110, IEEE Computer Society, 2001. [doi]

Abstract

As information on the world wide web continues to proliferate at an astounding rate, the extensible markup language (XML) has been emerging as a standard format for data representation on the web. In many applications, specific document type definitions (DTDs) are designed to enforce a semantically agreed-upon structure of the XML documents for management. However, both the data and the structure of XML documents tend to change over time for a multitude of reasons, including to correct design errors in the DTD, to allow expansion of the application scope over time, or to account for the merging of several businesses into one. However, most of the current software tools that enable the use of XML do not provide explicit support for such data or schema changes. In this vein, we put forth the first solution framework, called XML Evolution Manager (XEM) to manage the evolution of XML. XEM provides a minimal yet complete taxonomy of basic change primitives. These primitives, classified as either data changes or schema changes, are consistency-preserving, i.e., (1) for a data change, they ensure that the modified XML document conforms to its DTD both in structure and constraints; and (2) for a schema change, they ensure that the new DTD is a valid DTD and all existing XML documents are transformed also to conform to the modified DTD. We prove the completeness of the taxonomy in terms of DTD transformation. To verify the feasibility of our XEM approach we have implemented a working prototype system using PSE Pro as our backend storage system.