Processing XPath queries with forward and downward axes over XML streams

Makoto Onizuka. Processing XPath queries with forward and downward axes over XML streams. In Ioana Manolescu, Stefano Spaccapietra, Jens Teubner, Masaru Kitsuregawa, Alain Léger, Felix Naumann, Anastasia Ailamaki, Fatma Özcan, editors, EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings. Volume 426 of ACM International Conference Proceeding Series, pages 27-38, ACM, 2010. [doi]

Abstract

We propose an XPath processing algorithm that efficiently evaluates XPath queries in XP{downarrow,rightarrow,,[]} over XML streams. An XPath query is expressed with axes, which are binary relations between nodes in XML streams: ‘downarrow’ identifies the child/descendant axes and ‘rightarrow’ indicates the following/following-sibling axes. The proposed algorithm evaluates XPath queries within one XML parsing pass and outputs the fragments found in XML streams as the query results. The difficulty of XP{downarrow,rightarrow,,[]} evaluation lies in establishing dynamic scope control for the following/following-sibling axes. The algorithm uses double-layered non-deterministic finite automata (NFA) to resolve this issue. First layer NFA is compiled from XPath queries and is able to evaluate sub-queries in XP{downarrow,rightarrow,*}. Second layer NFA handles predicate parts. It is dynamically maintained during XML parsing: a state is constructed from a pair of the corresponding state in the first layer automaton and the currently parsed node in the XML stream. Layered NFA achieves O(|D||Q|) time complexity by introducing a state sharing technique, which avoids the exponential growth in the state size of Layered NFA by eliminating redundant transitions. We validate the efficiency of the algorithm through empirical experiments and show that Layered NFA is up to four times faster, and twice as fast on average, than existing algorithms.