Font Size: a A A

Thor: A universal XML index for efficient XPath query processing

Posted on:2009-12-23Degree:Ph.DType:Dissertation
University:Wayne State UniversityCandidate:Pettovello, P. MarkFull Text:PDF
GTID:1448390005460261Subject:Computer Science
Abstract/Summary:
The infancy of XML, in conjunction with the rapid growth and future potential that XML will become the data structure format of choice for many years to come, is a powerful motivation for developing the highest performance index possible for XPath. High speed XPath processing is essential for XQuery, since a single XQuery query may require execution of a large number of nested XPath queries. Relational Database Management Systems, RDBMS, have a very large installed base with large investment support therefore much effort has been placed on protecting and reusing existing relational technology by transforming the XPath requirements into existing RDBMS architectures, either by encoding limited additional data into the B+Tree index structure or by encoding the XML structure into relational tables. In general, the semi-structured data content and hierarchical tree structure of an XML document does not fit well into the relational model. To address the need, recently, there have been a large number of native XML databases, NXDB, released. Although commercial database systems provide capabilities to process XPath, they are largely optimized for rapid processing of ancestor-descendant and value based queries, yet structure navigation in these systems is still relatively slow and can be improved upon. Therefore, performance of the state of the art XML aware database systems has yet to achieve the pinnacle of performance.;In this dissertation, to address the XPath query performance challenge, we propose the creation of an application specific hierarchical navigational index system that can be constructed on top of a relational database 150 tuple storage system. The name of this new index is THOR, Threaded Hierarchical on Relational, and the name of the system is THOR4XP, Threaded Hierarchical on Relational for XPath, and was previously known in the literature as MTree, Multi-Threaded Tree. The main contributions are: (1) Introduction of the following and preceding threaded pointers into a tree data structure; (2) Integration of multiple doubly-linked, threaded, node label paths into a tree; (3) Combination of TSPath path summary index and structure index to improve query performance; (4) In situ indexing to eliminate the need for data model and query model transformations; (5) Hash Path Join (HPJ), Optimal Path Join (OPJ) and the Subtree Path Join (SPJ) algorithms for use with navigational indexes; (6) Index design that substantially reduces the need for sorting intermediate sequences; (7) Enriched index leaf structure that encodes both node properties and edge properties; (8) Composite partial multilevel intermediate node structures enabling multiple access paths into the same leaf set and tuple set that is integrated with a value index; (9) Application as a distributed P2P routing and indexing method; (10) A holistic systems design and implementation of a new hierarchical index that can work in combination with underlying relational storage systems in such a way as to make it feasible to integrate the index into relational database management systems.;The join algorithms efficiently resolve element name specific XPath navigational queries, in many cases without a need for sorting or for qualified name filtering on intermediate sequences. The optimization methods are applicable for all axes but are presented for the four major XPath axes: descendant, ancestor, following and preceding. Experimental results are included that show substantial performance improvements over other well known methods.
Keywords/Search Tags:XML, Index, Xpath, Structure, Query, Performance, Data, Relational
Related items