Font Size: a A A

Research Of The XML Documents Filiering Model By Sequencing Twig Patterns

Posted on:2014-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:J C HouFull Text:PDF
GTID:2248330398960152Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With mass information on internet, personalized subscribe service has gradually become an important means for people to obtain information. The core technology of the personalized subscribe service is how to process the mass data which is expressed in XML format quickly and efficiently. For the XML data flow, it generally uses XPath expression to query. Thus how to process the query for the mass XPath expression on the XML data flow is the core problem of the XML document filtering.Research on the XML document filtering based on twig pattern, mainly divided into two directions. One is the thought of decomposition. The core is how to match the position relationship between nodes efficiently. The algorithms have a common problem that it needs scan descendant node list repeatedly in some cases. This leads to a large number of intermediate results, thus affecting the efficiency of the algorithm seriously. Besides, it hardly supports the query for the expressions which contains much relative path expression and wildcard. The other direction uses the idea of holistic matching, namely taking the position relationship between nodes in the path expression into account, match the query as a whole. The main approach is to transform the input XML document as well as the query tree into a sequence with a kind of encoding, and then match the sequences with a kind of traversal strategy. There are lots of popular algorithms of this idea, but most of them do not support logic predicate processing.In this paper, we propose a novel filtering model called FST(Filtering by Sequencing Twigs) based on the analysis of current mainstream XML filtering algorithms. The model takes the idea of holistic match, the algorithm of Prufer sequence generation of the model transforms the XML document tree and twig patterns into strings, thus the problem of XML document filtering becomes the problem of Prufer sequence match. As auxiliary, the thesis presents a novel data structure to store the structure information of the XML document tree. The process of match uses a data structure called global runtime stack, it could guarantee the order of twig patterns correspond to the XML document. At the same time, we use a global runtime stack to optimize the algorithm from the generation of sequential index and the limitation of the subsequence matching range the two aspects, which ensures all the filtering are completed with XML document traversal one time. It improves the efficiency greatly. We also propose an improved strategy on twig patterns and a preprocess method which supports value-based predicates and logic predicates. Experimental data show that the XML filtering algorithm by sequential twig patterns in this paper has a greater efficiency than previous algorithms.
Keywords/Search Tags:predicate processing, XML filtering, twig pattern, FST
PDF Full Text Request
Related items