Font Size: a A A

Research For Document Distribution Technology On Probabilistic XML Data

Posted on:2016-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:F J SunFull Text:PDF
GTID:2308330461977076Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the extensive application of the uncertain data, the existing SDI technologies cannot provide the accurate information for the users according to their requests so the users usually receive the necessary and unnecessary information at the same time which inevitably causes the users waste a lot of time and flow when refer to the information. In order to dig deep and retrieval the information which meets the user’s requests, we design a distribution system to deal with the uncertain data document, which can improve the accuracy of the user’s information and meet their needs.This document uses the probabilistic XML document distribution system technology to filters out unnecessary information and realizes the user’s personalized requirements. Its fundamental idea is that it expresses the user’s individual information comprehensively and realizes the match of the information which the user needs with the data source document by the probability of an XML document filtering algorithm. And then, it distributes the matched documents to the users. Firstly, it expresses the user’s needs comprehensively by the XPath expressions. Based on XPath expression tree, it fully expresses user’s needs about the information which include the user query content, structure information and probability threshold. And then it decomposes the user queries XPath expression into the query sequence. Then the multi-user query substring can match with the data source documents efficiently by constructing the PXtrie probability index structure. Secondly, it filters probabilistic XML documents which are matched with user’s required. We can parse the XML documents containing uncertain data sources by SAX document parsing and find the content which the user demands by the node search algorithm in this paper. Then we can filter the user query information, structure information and probability threshold and update the query information through the matching renew algorithms.Finally, it designs a system which can distribute the probability XML documents and realizes it. This system include four modules:query decomposition, index build, probability XML document filtering and pre-process.Not only can the probability XML document distribution technology solve the problem of compressing storage and efficient indexing which multiuser XPath express, but provide the users with the exact information. To improve the efficiency of multi-user substring matching queries and reduce the redundancy match, it proposes PXtrie indexing structure. Finally, after verifying the availability and effectiveness of the system by the experimental data, it solves the problem about the distribution of the uncertain data document.The result shows that the system can not only deal with the XML document which contains uncertain data but provide the exact information to meet the user’s personalized needs.
Keywords/Search Tags:SDI, XML document, uncertain data, filtering algorithm, PXtrie probabilistic indexing architecture
PDF Full Text Request
Related items