Font Size: a A A

Research Of Massive Semantic Information Parallel Inference Method Based On Cloud Computing

Posted on:2013-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:H J ShiFull Text:PDF
GTID:2218330362959398Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of semantic applications such as semantic search engine and data integration with domain ontology, there generates huge amount of semantic information. How to infer and utilize massive semantic information is a hard problem. Current paralleling inference methods can be divided into 5 categories: methods based on relational database, methods based on data partitioning, methods based on Distributed Hash technology, methods based on P2P network, and methods based on Cloud Computing. Our paper will focus on the paralleling inference methods based on Cloud Computing. Most paralleling inference methods based on Cloud Computing are using MapReduce and Distributed File System (like HDFS) technologies to realize the inference for massive semantic information. Although such method has good performance and scalability, it cannot meet the requirements of storage and management of semantic information in real applications, such as finely granular management of semantic information, security of semantic information and etc, casued by lack of random file access on HDFS.In order to solve the problems mentioned above, we proposed a massive semantic information inference method based on Bigtable, after analyzing RDF (Resource Description Framework), OWL (Ontology Web Language) and SPARQL (Simple Protocol and RDF Query Language). For the storage aspect, we used current massive semantic information compression technology to realize compression storage of semantic information in Bigtable, basing on which we made further optimization to the SPARQL query algorithm; for the inference aspect, we made the optimization on current RDFS and OWL forward chain inference algorithms through optimizing the rule partitioning, reducing inference steps and reducing temporary results, in order to make the inference algorithms more compatible with Bigtable.We proposed the design and realization of RDFS, OWL inference and SPARQL query algorithm. We run our optimized algorithms on 4 nodes'experimental environment with LUBM (Lehigh University Benchmark) test data sets. The experiment results prove that the optimized algorithms we proposed are effective and efficient. Then, we used Hadoop and HBase to realize the massice semantic information parallel inference engine, MSIPIE (Massive Semantic Information Parallel Inference Engin). Test results prove that our inference engine is effective and efficient. There are three innovation points in our work:Proposed semantic information paralleling inference method based on Bigtable, to solve finely granular control problem of semantic information casued by lack of random file access on HDFS.Realized compression storage of massive semantic information in Bigtable with current MapReduce compression algorithm, to improve inferencing and query efficiency and solve security problem of semantic information;Proposed the optimized RDFS & OWL inference and SPARQL query algorithms, to make them more suitable for Bigtable. Then we realized MSIPIE based on Hadoop and HBase.
Keywords/Search Tags:Semantic Web, cloud computing, parallel inferencing, OWL, RDF, SPARQL
PDF Full Text Request
Related items