Font Size: a A A

Research On Top-k Query Algorithm Over Continuous Probabilistic XML

Posted on:2014-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhengFull Text:PDF
GTID:2268330422960770Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The uncertainty of data is widespread in many practical applications, such assensor network, information extraction, data integration system and scientific datamanagement system, etc. Traditional relational databases use structured way to store,not suitable for storage and management of uncertain data. At present, XML hasbecome the Internet information description and exchange standard, it is a kind of semi-structured data language with good scalability and self-describing which is moresuitable to describe the uncertain data. In recent years, probabilistic XML proposed byresearchers is a new representation method of uncertain data. Querying technology ofprobabilistic XML has become a hot spot in the research. In general, querying theprobabilistic XML will return results with an additional probability value. Users usersare usually interested in results with maximum probability value, the research onefficient Top-k queries algorithm over probabilistic XML has been widely noted byresearchers.At present, researchers did not deal with continuous uncertain data in the Top-kqueries algorithm over probabilistic XML that has been proposed, however thecontinuous uncertain data is widespread in the real world, such as temperature measuredby temperature sensors in a certain period of time obey a normal distribution, thequantity of broken bottles in brewery during a year obey a normal distribution, etc.Therefore, Study of querying technology of continuous uncertain XML has a certainpractical significance. Aiming at high efficient Top-k queries over continuous uncertainXML data, expand PEDewey coding to support the encoding of continuous distributiontypes of nodes, put forward SPCProTJFast algorithm on the basis of TJFast algorithmwhich is a common classic inquires algorithm over XML. In this algorithm, thetraditional merge algorithm is improved, filter algorithm is designed to deal with thecontinuous distribution types of nodes. The efficiency of the query has been improvedby means of multi-layer filtering mechanism. Due to an increase in the number of consecutive nodes and probability lower limit value is too small will reduce theefficiency of SPCProTJFast algorithm, an optimization algorithm called HPCProTJFastis proposed. Firstly, sift out twig that meet the condition of probability lower bound, andthen deal with continuous nodes associated with the query. It increases the filterlikelihood of consecutive nodes by rapidly increasing the lower bound of pathprobability, and has improved the query efficiency.Through a lot of experiments, compare SPCProTJFast algorithm, HPCProTJFastalgorithm and CProTJFast algorithm that just process continuous uncertain data simply.In the process of experiment, analyzing query time and the node rate for each queryalgorithm by changing the size of the document, k value and the query conditions. Theexperimental results show that the efficiency of SPCProTJFast and HPCProTJFastalgorithm is significantly higher than CProTJFast algorithm, and HPCProTJFastalgorithm performs better.
Keywords/Search Tags:P-document Model, Extend Dewey Encoding, Continuous UncertainXML, Top-k Query
PDF Full Text Request
Related items