Font Size: a A A

Research Of Distributed Parallel Reasoning Method For Massive RDF Data

Posted on:2017-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhengFull Text:PDF
GTID:2348330512472467Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
RDF(Resource Description Framework)is a framework for describing the information on the World Wide Web,which is proposed by WWW.With the rapid development of semantic Web technology,RDF data format is used in many fields such as bioinformatics,geographic information system,general knowledge and so on.Under the pressure of massive data,how to study efficient distributed parallel reasoning method of large-scale RDF data to find the implicit information has become an urgent problem.In this paper,we mainly study the distributed parallel reasoning scheme of massive RDF data,and propose three parallel reasoning schemes combined with MapReduce of Hadoop from different angles.Firstly,combined with data coding,SCOM(Semantic Coding with Ontology on MapReduce)algorithm,which is suitable for the distributed parallel semantic coding of RDF data,is proposed to complement the parallel reasoning of RDFS rules.The SCOM algorithm exploits the ontology file to build the class relationship model and property relationship model.Then it completes the lossless semantic compression of RDF data on MapReduce,generating regular encoding with semantics as well as completing the distributed parallel reasoning of RDFS rules.Secondly,DRRM(Distributed parallel Reasoning algorithm with Rete on MapReduce)algorithm is proposed to solve the defects of SCOM algorithm which could only complete the reasoning of RDFS rules.The DRRM algorithm extends the existing centralized Rete algorithm and applies it in a distributed environment for the parallel reasoning of RDFS/OWL rules.DRRM uses the ontology of RDF data to build lists of schema triples and models for rule markup,combines MapReduce to complete the buildings of the alpha stage and the beta stage in the Rete algorithm,and finally accomplishes the reasoning of all the RDFS/OWL rules in a Job task in parallel.Then,a new and efficient parallel reasoning algorithm for RDFS/OWL rules,the SPRM(Semantic information Parallel Reasoning on MapReduce)algorithm,is proposed to solve the problem of DRRM algorithm,which is limited by the memory of the cluster.It classifies the RDFS/OWL rules,and builds the transitive closure relation matrix and the information of connection variables according to RDF data ontology and RDFS/OWL rules;then it generates a rule markup by the transitive closure relation matrix and the information of connection variables,effectively filtering useless data;finally,by the classification of rules,it design reasoning schemes for rules of different types,completing the reasoning of RDFS/OWL rules in parallel on MapReduce.Finally,through some experiments,this paper proves that SCOM algorithm,DRRM algorithm and SPRM algorithm have higher efficiency than the existing distributed parallel reasoning algorithms of RDF data in circumstances of large data.Also,we verify that the intermediate results and duplicated triples produced by SPRM algorithm are far less than those of DRRM algorithm.The SPRM algorithm is more efficient than the DRRM algorithm in the reasoning of RDFS/OWL rules.
Keywords/Search Tags:RDF, OWL, MapReduce, Ontology, Distributed Reasoning
PDF Full Text Request
Related items