Font Size: a A A

Research On Indexing Technology For Continuous Uncertain XML

Posted on:2016-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:D D GuoFull Text:PDF
GTID:2298330452971203Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology, XML data has become a populardata form, and become the standard for data exchange and defacto instructions in theInternet. In the actual life, the uncertainty of data is common, the traditional deterministicdata cannot accurately describe the real world. People constantly research uncertainty dataand deeply understand data acquisition and processing technology. Uncertainty data inlogistics, industry, finance, military and other fields have quite a wide range of applications.Basically, uncertainty in the database is used to capture the state of the real world, such asmonitoring the pressure, temperature, the location of the moving target all are constantlychanging.The form of probability value or a probability distribution in an XML documentcan express the uncertainty information of data. For continuous uncertain data, storageuses the range of probability density function pdf’s possible values instead of a singlevalue of data.And the corresponding probability threshold range queries is through a givenprobability threshold and scope to get results that are more than the given probabilitythreshold and satisfied the query scope. In the probability threshold range queries, due tothe emergence of satisfying the specified probability value, the results areexpanded.Probability threshold range queries are more accurate and more informatizationthan the traditional query. As the user’s query demand growing and diversifying,effectively building XML index is facing severe challenges.At present, the development of XML index technology has also become a famousresearch. In the practical application, a lot of data obey continuous distribution, based onthe research of the existing XML indexes, this paper proposes a kind of RLPI index for theprobability threshold range queries. RLPI index is applicable for any continuous uncertainXML. First of all, coding is improved on the basis of Dewey coding, dealing withdistribution node IND and MUX’s coding in the uncertainty XML, proposing a prefixencoding PEDewey.Second, index entries with the same reverse table path gather storagein the RLPI path index to save space costs; In the RLPI value index, through pretreating any continuous uncertain data and combining the filtering strategy accordingly, filteringirrelevant nodes with the query, reducing the calculation of the pdf, so as to improve thequery speed.As calculating continuous uncertain data’s pdf consumes more time, in orderto further improve the query speed, proposing an optimization algorithm that is CUXIindex tree. This algorithm refers to ideas of R-Tree, R-Tree is recursively top-down tobuild spatial data index tree. Through the continuous uncertain XML data clustering,building the corresponding index tree. Nodes calculate in advance and stores some relatedinformation of continuous uncertain data, to filter out irrelevant nodes with the probabilitythreshold range query, in order to reduce the number of dealing with elements and improvethe speed of the query.In this paper, the experiment is setting document sizes, query cases and probabilitythreshold as variables, comparing query response time to test algorithm performance.Through analysis of the experimental results, proving the proposed RLPI index algorithmand CUXI index tree algorithm are effective.
Keywords/Search Tags:Continuous uncertain data, XML, Index, Probability threshold range queries
PDF Full Text Request
Related items