Font Size: a A A

Research On Key Technologies Of RDF Graph Data Management

Posted on:2009-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:G WuFull Text:PDF
GTID:1118360272491687Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Based on the Resource Description Framework (RDF), the Semantic Web pro-vides the ability of data sharing and reusing across applications, enterprises, and com-munity boundaries. The RDF graph is the fundamental data model of RDF, and it isquite different from the traditional data models. Consequently, it presents new chal-lenges to the traditional data management approaches. First, an RDF graph is a hy-pergraph that requires a more complex storage scheme. Second, implicit semanticinformation and full-text information in an RDF graph complicate the process of queryevaluation. Third, since web-scale RDF graphs are very common, an effective rankingscheme is indispensable. In this thesis, we tried to solve the above problems and havedone the following work.We study the computation of re?exive and transitive closure in the inference en-gine, and propose a prime number labeling scheme, called PLSD, for directed graphs.PLSD translates the reachability between nodes in a directed graph, i.e. re?exivity andtransitivity, into the divisibility between integers in their labels. In comparison withthe conventional forward and backward chaining approaches, PLSD can compute there?exive and transitive closure more efficient. The experimental results also show thatthe performance of PLSD is better than that of other labeling schemes.In terms of the hypergraph property of the RDF graph model, we propose a nativeRDF graph storage approach called PI. It avoids"impedance mismatch"existing inthe transformation between two inconsistent data models. PI has several advantages:1) it reduces the cost of space; 2) it makes it easier for the implementation of differ-ent graph-based algorithms; and 3) it clusters directed edges in an RDF graph. Weimplemented a semantic query system based on this storage approach and the PLSDinference approach described above. Experimental results using the LUBM benchmarkshow that the proposal approaches outperform the existing approaches with respect tothe combined metric. In terms of the large scale full-text information in an RDF graph, we propose a finegrained keywords search approach which takes RDF resource as the unit of indexingand retrieving. In this way, keywords search and semantic query can be combinedseamlessly.For query result ranking, we propose three levels of ranking on the RDF graphmodel: 1) ranking the importance of concepts and relations on the level of ontology; 2)ranking the global importance of resources based on the results of ranking concepts andrelations; and 3) ranking based on keywords search similarity and global importanceof resources. The algorithm for ranking on the level of ontology is named CARRank.It mutually reinforces the importance of concepts and the weights of relations in theiteration process. We present a proof for the convergence of CARRank. Experimentsand evaluations indicate the effectiveness of the proposed ranking algorithms.Finally, the proposed approaches and algorithms have been applied to a prototypesystem. The system has been successfully utilized for managing semantic data in thefield of Chinese news, which also indicates the practical significance of the researchwork in this thesis.
Keywords/Search Tags:RDF Graph, Data Management, Semantic Web, Ontology
PDF Full Text Request
Related items