Font Size: a A A

The Research And Improvement Of XML Tree Pattern Matching Query Algorithm

Posted on:2008-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:N YangFull Text:PDF
GTID:2178360212493767Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML(eXtensible Markup Language) has become the focus of research and industrial communities as the representative of new technologies on the Web. XML is a self-described and flexible data format and is fast emerging as the dominant standard for representing and exchanging information over the Internet. To fulfill its potential to build effective distributed computing platform and web applications, we need effective and efficiently query and filtering techniques to extract, synthesize and analyze their contents. Traditional database technologies can't work efficiently owing to the tree-like nature of XML data and new application environment. New technologies specially designed for XML data are needed to process XML data efficiently.Query is one of the most important issues in data processing. Path expression is the core of XML query and filtering language, so how to effective and efficiently evaluate path expressions play a key role. XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. Finding all distinct matching of the query tree pattern is the core operation of XML query evaluation.The existent methods for tree pattern matching are decomposition-matching -merging processes, such as Twig Join Algorithm, which may produce massive useless intermediate result or require repeated matching of some sub-patterns. This thesis proposes a flexible tree pattern matching algorithm called PatternMatch to find all distinct matching of a query tree pattern directly. The PatternMatch Algorithm does not produce any useless intermediate results and the final results are compactly encoded in stacks, from which the explicit representation can be produced efficiently.In this paper, we first introduce the background of tree pattern matching algorithm and the research achievements of structural joins algorithm, secondly, this paper gives a detailed description of XML query language and XML index. Further more, through particularly analyzing a representative algorithm-Twig Join, we derives lots of detects from present structural joins algorithm, which can result in massive useless intermediate result or repetition of some sub-tree pattern matching. Based on this, we proposes a kind of improved algorithm-PatternMatch and elaborates the algorithm advantages through the concrete performance analysis. At last, we carries on a summary of this paper, pointes out the shortcomings of PatternMatch and prospects the further work.
Keywords/Search Tags:XML, Compact Tree Indexing, Tree Pattern Matching, XML query evaluation, Path Summary
PDF Full Text Request
Related items