| RDF refers to technical specifications of Markup Language for World Wide Web,which fully expresses and describes the structure and content of network resources.RDF and OWL standards in the Semantic Web have led to extensive applications in various fields such as health care,life science,e-marketplace,and geospatial analysis.For the case of rising magnitude of massive data,studies of efficient and scalable distributed parallel reasoning methods for large-scale RDF data have become unavoidable and imperative so as to unearth implicit information.This thesis focuses on designing of distributed parallel inference solutions for fixed data sets and streaming data with a new set of parallel reasoning algorithm for both regular RDF data and flowing RDF data based on Spark distributed platform and MapReduce computing framework.Moreover,the outcome of this research has been applied to actual projects for tests.The main points of this paper are listed as follows:Firstly,this paper proposes the Distributed parallel reasoning algorithm based on Spark(DPRS)algorithm.The algorithm for a fixed set of data,according to the RDF ontology data,the algorithm constructs the corresponding alpha register of the pattern triplet and broadcasts it to each node of the cluster.The algorithm can judge and mark whether the rules can be activated in advance,and only inference the rules can be activated,so as to realize the distributed multiple rules in a MapReduce task parallel inference.Finally,the repeated triples are deleted in real time and the data of conflict set are updated in the corresponding registers,which further improves the efficiency of the iterative reasoning in the next step.Experimental results show that through this algorithm,the results of parallel reasoning for large-scale data can be achieved efficiently and correctly.Secondly,in order to solve the problem of low efficiency for real-time data reasoning in DPRS algorithm,a Parallel Reasoning Algorithm for Streaming RDF Data(PRAS)algorithm is proposed.The algorithm integrates OWL Horst rules and RDF ontology files to construct pseudobipartite network for rules;the addition of new data triggers OWL Horst rules reasoning algorithm with new data and previous generated reasoning data loaded,thus heuristic inference for streaming data with MapReduce framework and pseudo bidirectional network implemented;triples formed by reasoning are then deduplicated and stored in Redis cluster as input data for next reasoning process so as to improve the efficiency and integrity of the flow of reasoning.Experiments show that the PRAS algorithm can achieve more efficient parallel inference of large-scale streaming data in Spark platform and Redis cluster compared to the DPRS algorithm.Last but not least,the achievements of this research on distributed RDF data parallel reasoning method is applied to remote fault diagnosis system for micro-laser equipments of an armed force.The first step is to construct the micro-laser equipment fault ontology file by the method for constructing ontology with the database of micro-laser equipment fault cases accumulated in the project in which related data is to be transformed into RDF data.The second step is to realize incremental reasoning function for micro-laser equipment fault cases in the cloud by means of the distributed stream reasoning algorithm in this paper.In a nutshell,the distributed parallel reasoning method for fixed RDF data sets and streaming RDF data proposed in this paper is proved to be of reasonable value and can offer pragmatic help as reference when it comes to OWL Horst rule reasoning of massive data. |