Font Size: a A A

Research On Technology For Uncertain XML Of Complex Twig Query Processing

Posted on:2015-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y T HanFull Text:PDF
GTID:2298330422490186Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of data management technology, people’s perception of the data is gradually changing in recent years. Realizing the uncertainty is the inherent attribute of the data, the researchers broke through the traditional idealized definitions of data. Uncertain data is widespread in people’s daily life. With the development of computer technology, database technology welcomed a new revolution. A flexible structure of the Extensible Markup Language (XML) is becoming the de facto standard for data storage and information transfer and plays a more and more important role in the Internet. Depending on the structure of its own, self-descriptive and some other characteristics, XML is not only a conventional breakthrough of the relational data model in pattern strict constraints on the data pattern, but also applies to the description of uncertain data. Uncertain XML management technology is attracting more and more attention. At present most of the uncertainty XML query is a based on the structure matching and query by content. The query of uncertain XML is often with a complex query semantics in the real world. Actually, query mode is not only a simple Twig Pattern, but also includes logic predicates, wildcards and so on to express more extensive query semantics. Therefore, how to deal with Complex Twig Query of uncertain XML is an urgent issue to be solved.With analyzing and summarizing the existing algorithms of XML with Complex Twig Query processing, we found out that they are not suitable for processing Complex Twig Query of uncertain XML. But in real life, simple Twig query semantics can not meet the demand of users. Therefore, the research of uncertainty XML with Complex Twig Query has some practical value. Firstly, we proposed an new encoding method REDewey based on the classic prefix coding scheme EDewey for the distribution nodes. Secondly, we proposed Prob-BooleanTwig algorithm to processing Twig patterns which could contain logical predicate like AND, OR and NOT. The method used path index to cluster codes that are in the same path to accelerate the matching speed of leaves. During the query processing, we only traverse the query mode once, take a different approach based on the predicate type matching, and improve the query efficiency. In order to filter out nodes which do not participate in the final result as early as we can, and handle the wildcard query, We proposed an optimization algorithm Prob-BooleanStarTwig. The algorithm design LSPI index based on the original path index, which increased sibling labels to improve matching speed of AND and OR predicates. We also proposed Finite Automation of Twig Pattern to accurate path matching and convert wildcard into A-D connection and hierarchical information constraints. The algorithm uses three layers of filtering strategies including probability threshold, the level of information and path information filteration is based on a bottom-up pattern matching.A lot of experiments have tested the performance of algorithms. We control variables including document size, probability threshold and query cases.We mainly take response time of the experimental algorithm as the evaluation factor. Both analytical and experimental results show that query efficiency of Prob-BooleanTwig algorithm and Prob-BooleanStarTwig algorithm is significantly better than existing algorithms.
Keywords/Search Tags:Uncertain Data, XML, LSPI Index, Complex Twig Pattern
PDF Full Text Request
Related items