Font Size: a A A

Research On Top-k Keyword Search Algorithm In Probabilistic XML Document

Posted on:2013-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:X P ZhouFull Text:PDF
GTID:2248330371470766Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present, there’s an increasing trend that lots of data is transfered or manipulated in the form of XML on the Internet, the flexibility of XML enables the exchanging of the complicated semistructed data conveniently. But, the semistructed data described with XML contains much uncertainty in the real world, such as the noisy sensor data, the data source with fault data during the information extraction and the image processing. As the core work of the research on uncertain data, probabilistic XML can describe the uncertain data specificly, and it is the semistructed description language for uncertain data.For the uncertainty in the probabilistic XML document, the expected information couldn’t be returned to the users exactly by the traditional XML query languages. Therefore, after studying the former XML keyword search algorithms of the ordinary XML document, a new top-k keyword search model for probabilistic XML document is proposed, which consists of three components, such as the partition processing of probabilistic XML document, the keyword index building of probabilistic XML document and the top-k keyword search of probabilistic XML document.When conducting the keyword search in the big probabilistic XML document, the time efficiency of keyword search reduces appearently. This thesis introduces the XML partition method for the ordinary XML document into the probabilistic XML document, and the probabilistic XML document is partitioned into several independent parts and stored in different net nodes, then the keyword search algorithm is executed in parallel among all the partitions.To reflect the node type information of the keyword index structure of probabilistic XML document, the traditional Dewey coding method has been extended, and a new keyword index coding method is designed for probabilistic XML document.Based on the work above, the Probabilistic XML Top-k Keyword Search (PTKS) algorithm is designed, and prototype system has been implemented. The experimental results show that the PTKS algorithm has good time efficiency, especially for the complicated probabilisitic XML document.
Keywords/Search Tags:Probabilisitc XML, Dewey Code, Probabilistic XML Partition, Smallest Lowest Common Ancestor, Keyword Search
PDF Full Text Request
Related items