Structural summaries for efficient XML query processing

Posted on:2012-08-08

Degree:Ph.D

Type:Dissertation

University:Indiana University

Candidate:Brenes Barahona, Sofia

Full Text:PDF

GTID:1458390011952369

Subject:Computer Science

Abstract/Summary:

As the number of applications that rely on XML data increases, so does the need for performing efficient XML query evaluation. A critical part of the solution involves providing new techniques for designing effective XML indices and lookup algorithms, since traditional value-based indices are not fit to handle the structural features of XML and the pattern matching characteristics of its queries. While extensive research has been done on both the formal properties of XML query languages and the engineering aspects of structural index design, there currently exists a gap between these efforts, as the existing approaches to structural indices for XML have not taken advantage of the extensive research that has been done in the theoretical study of XML query languages.;In this dissertation, we draw from research methodologies used in the study of relational query languages and the design of relational indices and extend it to the context of XML. In particular, we study the roles various fragments of the XPath algebra play in distinguishing data components in an XML document, and leverage the results in the design of the Trie indices: a family of disk-based structural indices for efficient XML query processing.;Our indices include the N[k]Trie and P[k]Trie indices, based respectively on the node and node-pair partitions of an XML document. We also present a family of workload-aware trie-based indices: the WP[k]Trie, AW[k]Trie, and W[k]Trie indices. These indices improve on the P[k]Trie index by using query workload information to produce indices that yield better overall throughput while balancing the space and performance trade-off at the core of index design. They extend the P[k]Trie index to include frequent label-paths and a carefully selected set of complementary label-paths.;Except for the N[k]Trie index, all of our indices are able to answer any core XPath query using an index-only query evaluation plan, including queries with predicates and queries with '//' in the middle. Additionally, our workload aware indices are optimal for answering frequent path queries in one index lookup and efficient for answering non-frequent path queries using an index-only plan.;The indices presented in this dissertation are the result of the application of a new methodology in the context of XML and provide an important contribution to the state of the art of XML access methods for evaluating XPath queries, and offer a new alternative that presents significant gains in efficiency at reasonable costs.

Keywords/Search Tags:

XML, Indices, Structural, Queries, Trie

Related items

1	Operation Structural Retrieval Of Mathematical Expressions Based On Trie Tree
2	The Study On High-speed Algorithms Of Packet Classification And Its Implement Techniques
3	Research On String Similarity Join Algorithm
4	The Research Of Query Optimization Based On Bitmap Indices For Data Warehouse
5	The Research Of The Comprehensive And Efficient IP Lookup Algorithm Based On Trie
6	Design And Implementation Of A Pipelined IP Lookup Structure Based On Trie
7	Research On Extending Skyline Queries Over Big Data
8	Queries And Computations On Ciphertexts In The Cloud
9	The Research Of Key Techniques Of Uncertain TOP-K Query Processing
10	Research On Efficient Index Structure And Parallelization Based On Double Array Trie