Efficient structural join processing algorithms

Posted on:2006-09-05

Degree:Ph.D

Type:Thesis

University:Hong Kong University of Science and Technology (People's Republic of China)

Candidate:Liu, Kaiyang

Full Text:PDF

GTID:2458390005996995

Subject:Computer Science

Abstract/Summary:

Extensible Markup Language (XML) has become the de facto standard for data representation, exchange and publishing over the Internet. As more and more XML data is gathered and processed by database management systems, we need innovative approaches to efficiently manage and query it. However, XML queries possess some unique features, which distinguish them from the traditional RDBMS queries and pose a challenge to the XML research community. This dissertation investigates the problem of how to improve the XML path expression performance by considering the unique features of path expression, since path expression is the core part of XML queries. By fully exploiting these features, we are able to propose some novel algorithms to significantly improve XML path expression performance.; We first investigate the issue of optimizing XML path expressions without considering the workload. A basic kind of XML path expression is the structural query, which retrieves all descendants of a specific XML element. Structural queries are important because they constitute the building block of various complex XML path expressions. Among proposed XML query processing techniques for structural queries, structural join outperforms most other graph-traversal based approaches as it minimizes the number of nodes accessed to evaluate a structural query. In the first part of this dissertation we show that general path expressions, involving conditions on attributes, possess inherent multi-dimensional characteristics that are not captured by existing XML query processing methods based on 1D ordering. Motivated by this, we index XML data with multi-dimensional access methods and develop efficient algorithms to minimize the query cost. Extensive experimental evaluations confirm that our algorithms provide significant performance gains for numerous query types.; Our next goal is to further improve the path expression performance by considering the query workload. All existing indices for structural join focus on indexing the encoding information of XML elements so as to improve the XML path expression performance. Consequently, such indices utilize only data characteristics, but ignore query characteristics that are important for the further improvement of the query performance. In this thesis, we propose AC-tree (Adaptive Cluster B+-tree), which is a fully workload-aware structural join index, to provide a simple, but efficient way to exploit the XML query characteristics for performance improvement. To the best of our knowledge, AC-tree is the first structural join index to take the workload into consideration. An extensive set of experiments confirm that AC-tree outperforms competitors significantly for both simple (i.e., non-branching) XML path expressions and branching path expressions.

Keywords/Search Tags:

XML, Structural join, Efficient, Processing, Algorithms, Data

Related items

1	Query Processing And Research For Structural Join Against XML Data
2	Research And Implementation Of Structural Join In XML Data Graph
3	Structural join: Processing algorithms and size estimation
4	Energy-Efficient Join Algorithms In Multi-Memory Hierarchies
5	Efficient join processing in spatial database systems
6	Research On Structural Join Oriented XML Data Compression
7	Research On Encoding Scheme And Structural Join Algorithm Of XML Data
8	Efficient SPARQL Theta Join Processing On Large Scale RDF Graphs
9	Efficient Similarity Join Over Probabilistic Data Streams Based On Earth Mover's Distance
10	On Optimization Of Join Query Algorithms For Massive Data