Rajesh Bordawekar, Lipyeow Lim, Anastasios Kementsietsidis, Bryant Wei-Lun Kok. Statistics-based parallelization of XPath queries in shared memory systems. In Ioana Manolescu, Stefano Spaccapietra, Jens Teubner, Masaru Kitsuregawa, Alain Léger, Felix Naumann, Anastasia Ailamaki, Fatma Özcan, editors, EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22-26, 2010, Proceedings. Volume 426 of ACM International Conference Proceeding Series, pages 159-170, ACM, 2010. [doi]
The wide availability of commodity multi-core systems presents an
opportunity to address the latency issues that have plaqued XML query
processing. However, simply executing multiple XML queries over
multiple cores merely addresses the throughput issue: intra-query
parallelization is needed to exploit multiple processing cores for
better latency. Toward this goal, this paper investigates the
parallelization of individual XPath queries over shared-address space
multi-core processors. Much previous work on parallelizing XPath in a
distributed setting failed to exploit the shared memory parallelism of
multi-core systems. We propose a novel, end-to-end parallelization
framework that determines the optimal way of parallelizing an XML
query. This decision is based on a statistics-based approach that
relies both on the query specifics and the data statistics. At each
stage of the parallelization process, we evaluate three alternative
approaches, namely, data-, query-, and hybrid-partitioning. For a
given XPath query, our parallelization algorithm uses XML statistics to
estimate the relative efficiencies of these different alternatives and
find an optimal parallel XPath processing plan. Our experiments using
well-known XML documents validate our parallel cost model and
optimization framework, and demonstrate that it is possible to
accelerate XPath processing using commodity multi-core systems.