Efficient physical operators for cost-based XPath execution

Haris Georgiadis, Minas Charalambides, Vasilis Vassalos. Efficient physical operators for cost-based XPath execution. In Ioana Manolescu, Stefano Spaccapietra, Jens Teubner, Masaru Kitsuregawa, Alain Léger, Felix Naumann, Anastasia Ailamaki, Fatma Özcan, editors, EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings. Volume 426 of ACM International Conference Proceeding Series, pages 171-182, ACM, 2010. [doi]

Abstract

The creation of a generic and modular query optimization and processing infrastructure can provide significant benefits to XML data management. Key pieces of such an infrastructure are the physical operators that are available to the execution engine, to turn queries into execution plans. Such operators, to be efficient, need to implement sophisticated algorithms for logical XPath or XQuery operations. Moreover, to enable a cost-based optimizer to choose among them correctly, it is also necessary to provide cost models for such operator implementations. In this paper we present two novel families of algorithms for XPath physical operators, called LookUp (LU) and Sort-Merge-based (SM), along with detailed cost models. Our algorithms have significantly better performance compared to existing techniques over any one of a variety of different XML storage systems that provide a set of common primitive access methods. To substantiate the robustness and efficiency of our physical operators, we evaluate their individual performance over four different XML storage engines against operators that implement existing XPath processing techniques. We also demonstrate the performance gains for twig processing of using plans consisting of our operators compared to a state of the art holistic technique, specifically Twig2Stack. Additionally, we evaluate the precision of our cost models, and we conduct an analysis of the sensitivity of our algorithms and cost models to a variety of parameters.