Font Size: a A A

Structural summaries for efficient XML query processing

Posted on:2012-08-08Degree:Ph.DType:Dissertation
University:Indiana UniversityCandidate:Brenes Barahona, SofiaFull Text:PDF
GTID:1458390011952369Subject:Computer Science
Abstract/Summary:
As the number of applications that rely on XML data increases, so does the need for performing efficient XML query evaluation. A critical part of the solution involves providing new techniques for designing effective XML indices and lookup algorithms, since traditional value-based indices are not fit to handle the structural features of XML and the pattern matching characteristics of its queries. While extensive research has been done on both the formal properties of XML query languages and the engineering aspects of structural index design, there currently exists a gap between these efforts, as the existing approaches to structural indices for XML have not taken advantage of the extensive research that has been done in the theoretical study of XML query languages.;In this dissertation, we draw from research methodologies used in the study of relational query languages and the design of relational indices and extend it to the context of XML. In particular, we study the roles various fragments of the XPath algebra play in distinguishing data components in an XML document, and leverage the results in the design of the Trie indices: a family of disk-based structural indices for efficient XML query processing.;Our indices include the N[k]Trie and P[k]Trie indices, based respectively on the node and node-pair partitions of an XML document. We also present a family of workload-aware trie-based indices: the WP[k]Trie, AW[k]Trie, and W[k]Trie indices. These indices improve on the P[k]Trie index by using query workload information to produce indices that yield better overall throughput while balancing the space and performance trade-off at the core of index design. They extend the P[k]Trie index to include frequent label-paths and a carefully selected set of complementary label-paths.;Except for the N[k]Trie index, all of our indices are able to answer any core XPath query using an index-only query evaluation plan, including queries with predicates and queries with '//' in the middle. Additionally, our workload aware indices are optimal for answering frequent path queries in one index lookup and efficient for answering non-frequent path queries using an index-only plan.;The indices presented in this dissertation are the result of the application of a new methodology in the context of XML and provide an important contribution to the state of the art of XML access methods for evaluating XPath queries, and offer a new alternative that presents significant gains in efficiency at reasonable costs.
Keywords/Search Tags:XML, Indices, Structural, Queries, Trie
Related items