Font Size: a A A

Research On Tree Pattern Matching Of XML

Posted on:2012-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2218330338462751Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and Web technology, XML has already become a universal standard for data representation and exchange. Increasing XML data brings new challenges to the traditional database technology, how to query XML data efficiently has become an important research field of database management.This paper studies the problem of matching tree pattern query over XML documents; it can be seen as the multi-step of structural connections, these step operations are treated as a whole, which avoid a large number of independent nodes to process, thereby improving the query efficiency. Tree pattern matching can be used as an operator in XQuery, and has been widely regarded as the core operation of XML query, so a lot of research has been made on this field. The earlier algorithms are mostly based on two phases; first decompose the query into a number of individual path matches, through an index or other filtering methods for partial match results, and then stitch them together to form result. In this way, the query contains "/" may generate the intermediate results that can not build the final output, and the merge phase will cause high computational cost. Algorithms recently designed are trying to minimize the number of unrelated intermediate results, while avoiding the merge operations, but they are mostly based on complex structures, costly to implement, in particular, the worst case may have to build the entire document tree in memory.To solve the above problem, this paper proposes a new approach to evaluate tree pattern matching, research has been done in following aspects:(1) this paper analyzes the existing matching algorithms, and summarizes the improvements that may efficiency the query; (2) based on DataGuide, also considers that a query usually has only a few nodes to output, this paper propose a novel single-phase approach, TwigFilter, which handle the query process according to the output nodes, thus reducing the number of unrelated intermediate nodes, avoiding the corresponding I/O and matching overheads. The following experiment compared with TJFast algorithm show the effectiveness of the TwigFilter algorithm; (3) consider the query preprocessing under DTD, this paper proposes the improved algorithm patternFilter, taking into account the situation that a query may have multi-output node to handle, and optimize the processing order of the algorithm. The experiment results show that the patternFilter algorithm has better performance for tree pattern queries than other algorithms such as TwigStack, TJFast and TwigFilter.
Keywords/Search Tags:XML, DTD, Tree pattern matching, Data Guide, Preprocessing
PDF Full Text Request
Related items