Font Size: a A A

Studies On Keyword Search Over Probabilistic XML Data

Posted on:2016-03-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:1318330542987067Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Keyword search is widely used in many web applications,such as Information discovery,Data mining.Keyword search has been studied for many decades.From early keyword searching on relationship databases,graph databases to semi-structure databases,many achievements have been obtained.For given keywords,keyword search aims at finding the matching nodes and the subtree which has contained all the keywords.To some extent,keyword search amounts to be accepted by users,because keyword search does not require users learn complex query language and understand the structure on underlying data based on the existing knowledge.Because there has not any structure information between keywords,this suggests inherent difficulty in keyword search.Thus,researchers still need to pay more attention on this topic to find searching results for meeting users' requirements.Recently,with the development of Internet and appearing of uncertain data,requirements for uncertain data managing become stronger and stronger(probabilistic XML data model),and this makes keyword search on probabilistic XML data a current hot-spot research.Thus,studies on keyword search on probabilistic XML data have both theory significance and value of practical applications.In this dissertation we propose some definitions of keyword search results and some keyword search algorithms based on the concept of possible world on probabilistic XML data.Besides,we study the relationships between nodes and retrieved results,retrievd result and structure of data,and propose keyword search algorithms based on user preference.In this dissertation,we only focus on the keyword search on probabilistic XML data and our contributions are summaried in the following:(1)We propose an algorithm named Probabilistic ELCA,which solves keyword search on probabilistic XML data.In order to ensure the recall ratio and improve the efficiency of algorithm,we propose an algorithm by using the probability distribution.We compute probability distribution of father node according to node type of its children nodes.The experiment demonstrates the efficiency and accuracy of PrELCA algorithm.(2)We propose a probability threshold keyword search algorithm based on ELCA(PrELCA-threshold).Firstly,we propose the definition of probability threshold ELCA.Then,we propose a probability range algorithm to ensure the accuracy and improve the efficiency of algorithm.Next,we propose a method to reduce probable range based on upper bound and lower bound.Finally,we propose an algorithm based on the pruning approach to reduce the calculations.The experimental results show that our approach can ensure the accuracy of algorithm and improve the efficiency.(3)We propose the nearest keyword search on probabilistic XML data(PNK).Firstly,we give the definition of nearest keyword search and probability threshold nearest keyword search.Then,we propose space pruning strategy and probability pruning strategy to improve the efficiency.The experimental results show that our algorithmcan improve the efficiency based on pruning approach.(4)We propose the ranking based on probabilitic SLCA on probabilistic XML data.Firstly we define the probability SLCA and probability threshold SLCA.Then,we use traditional ranking of keyword search on probabilistic XML data.Finally we study the relationships between nodes and retrieved results,retrieved result and structure of data to propose ranking algorithm of keyword search.The experimental results show the efficiency of the proposed algorithm.In this dissertation,we propose several keyword search algorithms based on different results on account of the challenges of uncertain data.On one hand,it greatly improves the efficiency of keyword search on probabilistic XML data.On the other hand,our methods are also great complementary to existing keyword search technology.Theoretical analysis and experimental results show that our algorithms can solve their keyword search problems efficiently and outperform previous processing algorithms in veracity,execution efficiency and space complexity.
Keywords/Search Tags:keyword search, probabilistic XML data, probability threshold, ELCA, NK, Ranking
PDF Full Text Request
Related items