Font Size: a A A

Research On Uncertain XML Keyword Search Based On The Semantic Of SLCA

Posted on:2015-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:L C SuFull Text:PDF
GTID:2298330422990188Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the querying technology of XML has became a hot spot in theresearch. According to the different query mode, the XML data query can be dividedinto XML structural query and XML keyword query, but comparing to the XMLstructural query, users accustome to the XML keyword query which does not need thespecial field of knowledge. With the progress of the technology for data acquisitionand progressing, most of the data in the real world are uncertain. The uncertain XMLproposed by researchers is a new representation method of uncertain data, and theuncertain XML data have been widely used in field of economic, telecommunicationsand the military. Currently, the research on the uncertain XML keyword query isseldom. Because the uncertain XML keyword query will return results with anadditional probability value, users are usually interested in the results with the khighest property. The research on top-k keyword query algorithm over uncertainXML has been widely noted by researchers.Firstly, Existing algorithms of keyword search over uncertain XML are based onstack, which need to put the nodes into the stack, get the nodes out the stack and comparethe strings frequently, leading to a waste of time easily. To solve this problem, this paperproposes a new keyword search algorithm named PrList based on the dynamic keyworddata repository. This algorithm firstly initializes the dynamic keyword data repository, thentraverses the items of the keyword data repository from bottom to top, left to right to getthe SLCA nodes, and it does not need to put the nodes into the stack, get the nodes out thestack and compare the strings. Secondly, Exiting algorithms of Top-k keyword searchover uncertain xml just return root nodes with the k highest probabilistic existence,and they have to construct subtree results that meet some certain conditions, which areinefficient in practice. To solve this problem, this paper defines a novel Top-k querysemantics over uncertain xml named SRCT-Top-k based on smallest relatedconnected subtree, which returnes the smallest related connected subtree with the khighest probabilistic existence. In order to process the SRCT-Top-k search easily, thispaper extends the dynamic keyword data repository which becomes extended dynamic keyword data repository, and then proposes an algorithm named PrListTop-k based onthe extended dynamic keyword data repository. PrListTop-k finds the subtree resultsmeeting some certain conditions by scanning the dynamic keyword data repositoryonly once, and developes filtering strategies to reduce the number of intermediateresults.A lot of experiments are done in this paper. Through setting different query, thecomparisons test between PrList algorithm and PrStack algorithm are made, and thencompare PrListTop-k and PrListTop-k-N which has no filtering strategy. Then thefinal experiment results are analyzed in detail, which proves these two algorithms areefficiency.
Keywords/Search Tags:Uncertain XML, Keyword search, Top-k search, Dynamic keyword datarepository
PDF Full Text Request
Related items