Font Size: a A A

Research On Improved XML Twig Matching Algorithm Based On Version Tree

Posted on:2011-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:M L YaoFull Text:PDF
GTID:2178360308955383Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present, XML (Extensible Markup Language) has become the standard for data representation and data exchange over World Wide Web. As more and more XML data emerges, query processing over XML data shows its increasing importance. Currently, a large number of XML query processing algorithms based on tree model has been successively proposed. However, semi-structured nature of XML document and increasing complex of the users'queries brings new challenges over efficient processing XML data as follows: (1) the input size and execution time of these algorithms will increase rapidly as the size of XML document increases; (2) due to the irregular nature of XML data, the results that algorithm returned may exist duplicate phenomenon and require post-processing operation. Therefore, how to efficiently and accurately execute query on a large number of XML data becomes an important topic.In this paper, we propose an efficient and extensible XML twig matching algorithm named Twig3Version and an efficient XML twig matching algorithm named AdvancedTwigVersion in case of the returned results are matched twigs and non-redundant target elements respectively. The main research work and characteristic of the thesis are as follows:(1) In case of the returned results are matched twigs, we propose an idea of applying the proposed holistic twig pattern matching algorithms based on original document to compressed structure (Version Tree), present simple and efficient version filter module and merge join module. Structure matching on the concise Version Tree along with version filter and merge join on the concise intermediate results greatly reduces the query time.(2) In case of the returned results are non-redundant target elements, we propose AdvancedTwigVersion algorithm which improves TwigVersion algorithm. By using the label information of the ancestor element as much as possible, those elements that do not satisfy match can be filtered out as early as possible, thus speed up the matching process. In addition, we compare AdvancedTwigVersion with other XML twig matching algorithms in this case though a series of experiments.Both theoretical proof and experimental results reported in this paper demonstrate the good performance of Twig3Version and AdvancedTwigVersion in case of returned results are all matched twigs and non-redundant target elements respectively. With the development of internet and the progress of the era, query processing on XML data will have much huger development space and brighter future.
Keywords/Search Tags:XML, Query Processing, Twig Pattern Matching, Delete redundancy, Version Tree
PDF Full Text Request
Related items