Research On Sequence-based Indexing And Query Processing Technology For Uncertain XML

Posted on:2015-09-18

Degree:Master

Type:Thesis

Country:China

Candidate:P Wang

Full Text:PDF

GTID:2298330422990189

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

XML is a standard for data and was released for use in February1998by W3C.XML is becoming the standard for data interchange in the world of information since itis a subset of SGML and is from W3C. XML is involved in all related data storage, datainterchange areas, such as data storage in Web application, application configurationfile, data sharing between applications, etc. The objective world is complex, and so wehave to face some uncertain data in data processing. Use XML to store uncertain datahas become the current trend for the development of XML technology, as well as theadvantages of XML itself. Storing uncertain data in the form of XML with probabilisticinformation is called uncertain XML and query for uncertain XML has become the hotspot of the current study.At present, there are binary structure join and holistic matching in uncertain XMLtwig pattern matching. Binary structure join affects the efficiency of query seriouslysince it produces a lot of useless intermediate results. It is not convenient forprobabilistic threshold filtering because query process is too centralized in holisticmatching and can not use probabilistic threshold filtering efficiently to improve queryefficiency. In view of the problems in uncertain XML twig pattern matching at present,sequence-based matching is applied to uncertain XML query in this thesis. Sequence-based uncertain XML twig pattern matching algorithms called PrTRIM and H-PrTRIMare proposed by improving LCS-TRIM algorithm.The uncertain XML document has additional information on probability incomparison with ordinary XML document. Therefore, information on probability needsto be processed correctly in the query. In this thesis, we set up an index called PSI byextracting structured information and content information from the uncertain XMLdocument. Information on probability can be processed correctly by PSI such asrecognition of exclusive distribution nodes and calculation of the probabilities of queryresults. Subsequence matching and structure matching in query also need to use PSI. Some probabilities of query results in uncertain XML are too low and have nopractical value. It can filter out the results with low probabilities in query byprobabilistic threshold. A probability value called probabilistic threshold is given in thequery. The significance of probabilistic threshold is that the probabilities of queryresults are required to be greater than or equal to the given probability threshold. In thisthesis, probabilistic threshold filtering can be carried out three times in a query. It canensure that query results in keeping with probabilistic threshold and improve queryefficiency at the same time.The experiment carries out by comparing PrTRIM and H-PrTRIM. It includesthree aspects, that is, the query statements effect on query efficiency, probabilisticthreshold effect on query efficiency and documents size effect on query efficiency. Atlast we analyzed the experimental results. The results of the experiment show that theefficiency of the H-PrTRIM algorithm is close to PrTRIM algorithm in view of smalldocuments and simple structure query statements. But its efficiency is still higher thanPrTRIM algorithm. The H-PrTRIM algorithm is more efficient than the PrTRIMalgorithm in view of large documents and complex structure query statements. To sumup, H-PrTRIM algorithm has advantages in the case of large documents and complexstructure query.

Keywords/Search Tags:

Uncertain XML, Sequence, Twig pattern, Probabilistic threshold

PDF Full Text Request

Related items

1	Research On Technology For Uncertain XML Of Complex Twig Query Processing
2	Study On Holistically Twig Matching Algorithm Over Probabilistic XMLs
3	THE AERODYNAMIC PROPERTIES OF SPRUCE TWIG ELEMENTS
4	Research Of XML Query Algorithm Based On Twig Pattern Matching
5	Research On Query Processing Technology For XML Data Based On HoListic Twig Pattern
6	Research On Labeling And Twig Pattern Query Of XML Data
7	Research On Query Process Technology For Continuous Uncertain XML
8	Research On Algorithms For Discovering And Querying Sequential Pattern In Uncertain Sequence Databases
9	Research And Implementation Of Twig Query Evaluation Algorithms On Probabilistic XML
10	Research On Clustering And Pattern Mining Techniques For Uncertain Data Streams