Font Size: a A A

Research Of Query On The Probabilistic XML Document

Posted on:2017-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:X F YangFull Text:PDF
GTID:2348330518470769Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of data acquisition and processing technology, the deepening knowledge of the uncertainty data, for uncertainty data research has obtained the more widespread attention of researchers. XML is a markup language which issued by the W3C.XML has the characteristics of flexible very suitable for description, representation, storage,uncertainty data. Using XML representation uncertainty data are increasingly used in various fields. Leading to a dramatic increase of probabilistic XML document. Using the traditional XML clustering technology and query technology, processing a large number of probabilistic XML document is not acceptable because it consumed lots of time. Although for XML clustering technology, and query technology have a lot of research, but there are still some aspects need to be improved and perfected.Study of probabilistic XML document query, this thesis in-depth study of the current method has been proposed. The problems are analyzed, and puts forward solutions to solve these problems.In this thesis, the work are as follows:First, the high precision of direct clustering XML document but there exists large consumption of time and space, but the advantages of using the DTD document clustering the xml document will less consumption of time and space, so put forward the XML document clustering method based on DTD which named WSDTD. DTD documents can reflect the characteristics of XML documents, so clustering the DTD document can clustering the XML documents. By defining the structure similarity and semantic similarity of the tree.With K-Means clustering algorithm was carried out on the DTD document for clustering the DTD document, reach the purpose of the XML document clustering. Effectively narrowed the scope of the query, and lay a foundation for mass query XML documents.Second, according to the characteristics of the probabilistic XML document, In this thesis the pTwigList query algorithm which is for the probability XML document is designed.PTwigList algorithm is an improvement of the traditional query XML document algorithm TwigList. PTwigList is joined Top-K queries algorithm in TwigList algorithm. And use the improved area code, and the threshold value filtering, low probability of existence of the query results to be abandoned. By clustering before a query can achieve the thesis initially efficient query of a probabilistic XML document.Third, through the simulation experiment verified the effectiveness of the proposed WSDTD and pTwigList.The experiment adopted the query time to measure the query efficiency. Experiment will be divided into two parts, the first part is used to validate using WSDTD algorithm to cluster of large XML documents, and then in a small range of clustering results used in pTwigList query time consumes less than directly on the mass pTwigList query XML documents.The second part is used to verify the effectiveness of pTwigList algorithm under different test cases, under different K values, and different file size.
Keywords/Search Tags:XML clustering, query of uncertainty XML, pTwigList algorithm
PDF Full Text Request
Related items