XML query optimization in Timber

Posted on:2005-11-10

Degree:Ph.D

Type:Thesis

University:University of Michigan

Candidate:Wu, Yuqing

Full Text:PDF

GTID:2458390008486904

Subject:Computer Science

Abstract/Summary:

With the growing importance of XML as a format for data representation and data transport, being able to process XML queries efficiently becomes a topic of interest. Relational database techniques, even though mature, don't always apply in an XML context, due to the special features of XML data and XML queries. In this thesis, we study these features and investigate various aspects of query processing and query optimization in the framework of the Timber native XML database system.; We evaluate XML queries by breaking them into pattern trees, which specify the nodes to match, and the desired relationships between the nodes. The structural relationships are evaluated via structural joins. We introduce two families of structural join algorithms (Tree-Marge and Stack-Based), which take advantage of the (DocId, StartPos : EndPos, LevelNum) representation of positions of XML components and perform structural join efficiently, with time, space and I/O complexity linear to the input and output lists.; In this thesis, we focus on cost-based optimization for pattern tree matching. We propose five structural join order selection algorithms and show that fully-pipelined plans are possible for any given query pattern. Moreover, left-deep plans, which are chosen as the rule-of-thumb in a relational database, are not desirable. Relaxing the fully-indexed assumption brings issues related to data instantiation into the picture. The challenge is that in XML databases, the "data associated with an element" can vary. We define data instantiation as a first-class physical operator and take into consideration both the placement and granularity of data instantiation while optimizing pattern tree matching.; Accurate cost-based optimization depends on accurate result size estimation. We propose a novel summary structure, position histogram, to capture the position relationship between nodes in an XML document. Experiments demonstrate that the Ph-Hist algorithm is capable of estimating the result size accurately, for both single twig patterns and patterns that are arbitrarily complex.; Collectively, we are trying to address issues of storing and querying XML data efficiently, in the context of the Timber project. However, the theories behind the techniques proposed in this thesis apply irrespective to how the XML data is stored.

Keywords/Search Tags:

XML data, Query optimization, XML queries, Timber, Pattern tree matching

Related items

1	The Research And Improvement Of XML Tree Pattern Matching Query Algorithm
2	Research On Key Technologies Of Fuzzy XML Data Storages And Queries
3	Querying Optimization Research Based On XML Database
4	The Design And Optimization Of XML Data Query Method Based On The SF&B Compressing Storage Structure
5	Visual Construction Of Scientific Data Queries And Query Processing Optimization Techniques
6	Research On Efficient Processing Of XML Twing Pattern Queries
7	Research On The Optional Matching Issue In RDF Queries
8	Optimization Of Spatial-textual Queries
9	The Research Of Compression Storage Organization And Query Processing Methods Of XML Data Modeled As A Tree
10	Optimizing Twig Query Pattern Based On XML Schema