Research On Tree Pattern Matching Of XML

Posted on:2012-09-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y Hu

Full Text:PDF

GTID:2218330338462751

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet and Web technology, XML has already become a universal standard for data representation and exchange. Increasing XML data brings new challenges to the traditional database technology, how to query XML data efficiently has become an important research field of database management.This paper studies the problem of matching tree pattern query over XML documents; it can be seen as the multi-step of structural connections, these step operations are treated as a whole, which avoid a large number of independent nodes to process, thereby improving the query efficiency. Tree pattern matching can be used as an operator in XQuery, and has been widely regarded as the core operation of XML query, so a lot of research has been made on this field. The earlier algorithms are mostly based on two phases; first decompose the query into a number of individual path matches, through an index or other filtering methods for partial match results, and then stitch them together to form result. In this way, the query contains "/" may generate the intermediate results that can not build the final output, and the merge phase will cause high computational cost. Algorithms recently designed are trying to minimize the number of unrelated intermediate results, while avoiding the merge operations, but they are mostly based on complex structures, costly to implement, in particular, the worst case may have to build the entire document tree in memory.To solve the above problem, this paper proposes a new approach to evaluate tree pattern matching, research has been done in following aspects:(1) this paper analyzes the existing matching algorithms, and summarizes the improvements that may efficiency the query; (2) based on DataGuide, also considers that a query usually has only a few nodes to output, this paper propose a novel single-phase approach, TwigFilter, which handle the query process according to the output nodes, thus reducing the number of unrelated intermediate nodes, avoiding the corresponding I/O and matching overheads. The following experiment compared with TJFast algorithm show the effectiveness of the TwigFilter algorithm; (3) consider the query preprocessing under DTD, this paper proposes the improved algorithm patternFilter, taking into account the situation that a query may have multi-output node to handle, and optimize the processing order of the algorithm. The experiment results show that the patternFilter algorithm has better performance for tree pattern queries than other algorithms such as TwigStack, TJFast and TwigFilter.

Keywords/Search Tags:

XML, DTD, Tree pattern matching, Data Guide, Preprocessing

PDF Full Text Request

Related items

1	The Research And Improvement Of XML Tree Pattern Matching Query Algorithm
2	Data Compression Research Based On The Multiple Pattern Matching Algorithm
3	Research Frequent Pattern Mining Algorithm Based On Compact Pattern Tree And Multiple Minimum Support
4	The Research Of Multi-pattern Matching Algorithm Based On Sequential Binary Tree
5	XML Data Mining Based On Frequent Pattern Tree
6	Research On The Image Matching Method Based On Point Pattern Matching
7	Data Quality Control: Research, Design, And Implementation In Data Preprocessing
8	Research On Pattern Matching In Trees In Multi-dimensional Space
9	SIFT Based Image Matching Algorithm Research
10	Research On Technique Of Web Log Mining