Font Size: a A A

Intelligent Query And Retrieval Mechanism For Web-Scale RDF Graph Data

Posted on:2015-05-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:1228330452970614Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a common data model of Semantic Web and driven by linked data movement,graph data of resource description framework (RDF) have reached ten billion triples.RDF is a kind of non-classical graph data model, whose characteristic is that whenontology layer semantics is expressed, the edges of RDF graph can be configured asnodes, i.e., the set of edge labels may have a nonempty intersection with its set ofnodes. For such web-scale graph data, it is important to study the intelligent andefficient query and retrieval mechanism. The related work in this dissertation issummarized as follows:(1) The equivalence between the existential semantics and its current SPARQL1.1semantics is proved in the case of the removal of duplicate values. Afterwards, theproperty path expressions are transformed to the extended nested regular expressionsbased on existential semantics, and the semantics equivalence after transformation isproved considering RDFS semantics. The property path query engine based onexistential semantics is achieved by the combination of product automaton algorithmand transformation rules. Experimental results not only show the good characteristicsof this query engine in efficiency and reasoning, but also further validate theequivalence between existential semantics and its current semantics for property path.(2) The distributed technique is employed to construct the navigational axisindexs for nested regular expressions in the RDF graph data, and its navigation axisfrequency is recorded. A new concept of rare axis is proposed based on statistics, anda new evaluation algorithm is designed for nested regular expressions thereafter. Forthe nested regular expressions containing rare axes, the proposed algorithm can run innearly linear-time instead of polynomial-time. Experimental results in DrugBand andBioGRID show that this algorithm can improve the evaluation efficiency significantlywhile ensuring the accuracy.(3) By introducing uncertainty theory, and with the primary keyword andauxiliary keywords query mode as well as ORDPATH coding technology, a concept ofontology membership is proposed. According to the evidence theory in universalartificial intelligence, the membership value is calculated, and used to formulate theMultikeyRank algorithm by extending the classical BM25F algorithm. The proposedalgorithm was implemented in our distributed large-scale RDF data server "Jingwei".Experimental results show that compared with BM25F, the proposed algorithm improved the evaluation indexes for P@5, P@10, P@15and MAP to certain extent.The proposed property path query mode based on nested regular expressions notonly maintains the expression simplicity of property path, but also achieves thegoal-oriented and efficient reasoning to avoid computing the RDF graph closure,which is well able to meet the requirement of navigational path query and reasoningin web-scale RDF data. The semantic keyword retrieval system developed based onMultikeyRank model in this dissertation can identify the user’s query intentionintelligently only according to the primary keyword and auxiliary keywords, andreturn the results with user’s preference. The above two kinds of intelligent querymechanisms improve the user’s query experience, and moreover, show the uniquecharm of semantic web.
Keywords/Search Tags:RDF, graph, semantic web, path, intelligence, semantic retrieval, complexity
PDF Full Text Request
Related items