Font Size: a A A

Research Of Uncertainty RDF Keyword Search Based On R-clique

Posted on:2015-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:2308330482954488Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
RDF which is short for Resource Description Framework is the basic markup language of the semantic Web net. RDF is widely used in many fields. Because the method of extracting ontology, the method of marking resource and the technology of measurement usually has errors and noises, uncertainty RDF data are widespread. In recent years, the research on querying uncertainty RDF data has gradually become a hot spot. Since uncertainty RDF data can be modeled as uncertainty RDF graphs, the research on keyword querying over uncertainty RDF data is actually the research on keyword querying over uncertainty graphs. On the basis of the existing research, this thesis proposes two algorithms based on r-clique for keyword querying on uncertainty RDF graphs:KSABR and HABR.The word "clique" whose meaning is "maximal group, sub-group" stands for the sub-graph of an uncertainty graph.The letter "r" is a variablestanding for the distance threshold value. So "r-clique" stands for the sub-graph which contains all the query keyword nodes and the distance between any two nodes in it is not greater than the given value r. In order to improve the query speed, this thesis proposes an approximation algorithm for building r-cliques which produces r-cliques with 2-approximation in polynomial delay.KSABR (Keyword Search Algorithm Based on r-clique) maps the research on keyword querying over uncertainty RDF graphs to the research on finding r-clique over uncertainty graphs. In order to improve the quality of query results, this thesis proposes an algorithm with the higher precision on the basis of KSABR:HABR (High-precision Algorithm Based on r-clique). HABR utilizes the scoring function to sort results, and then return the top-k results to users.In order to accelerate the speed, two index structures are being designed:KI (Keyword Inverted Index) and PI (Probabilistic Inverted Index). KI is used to store the mapping relationship between keywords and keyword nodes, and implement structural pruning and probabilistic pruning. PI is used to store the mapping relationship between keywords and r-cliques, and help generate the function score ().The experiment proves that the algorithm KSABR proposed by this thesis has a good performance on response time and the algorithm HABR has a good performance both on the response time and the quality of results.
Keywords/Search Tags:uncertainty RDF graphs, keyword search, r-clique, pruning technique, top-k query
PDF Full Text Request
Related items