Font Size: a A A

Research On Key Technologies Of Large-Scaled Semantic Web Onotologies Querying And Reasoning Based On Hadoop

Posted on:2014-12-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:R LiFull Text:PDF
GTID:1268330392971923Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Semantic Web, which is proposed by Tim Berners-Lee, is the vision of nextgeneration of Web. Through combining the concepts of ontology from philosophy intocomputer science domain, computers can understand the information published in theSemantic Web, and it is possible to exchange semantic information among computers.In the World Wide Web Consortium (W3C) proposed semantic web stack,SPARQL-based Resource Description Framework (RDF) data querying, descriptionlogic-based Web Ontology Language (OWL) reasoning, and Semantic Web RuleLanguage (SWRL)-based OWL ontologies rule reasoning are the core contents ofsemantic web research.However, large-scaled ontologies have existed with the explosion of the semanticweb technologies, and the amounts of it is rapidly growing ever year. Therefore, theseconventional semantic web data querying and reasoning tools do not scale well for largeamounts of ontologies because they are designed for use on a single-machine context.Recent years, cloud computing has become one of the latest research area in bothacademe and IT industry because of its high-performance and scalability for storing andcomputing on large-scaled data. Nowadays, Hadoop technologies have become thede-facto standard of Big Data processing. Several researchers have started to combinecloud computing and semantic web technologies to explore high-performance ontologyquerying and reasoning solutions in the distributed computing context. However, thisnovel research area is still in the initial stage, lots of key problems need to be solved.To overcome the drawbacks, this thesis researches on the approaches of distributedquerying and reasoning for large-scaled ontology data by utilizing cloud computingtechnologies. This thesis can establish the theoretical research basis for implementinglarge-scaled semantic web ontology data management cloud services in the future. Themain research contents and innovative results are listed as follows.(1) Based on the W3C proposed semantic web stack, MapReduce distributedcomputing model and HBase distributed database technology, this thesis proposes aarchitecuture of large-scaled semantic web ontology data management cloud service.First, the author designs the architecuture according to querying and reasoning functions,the layers from bottom to up consists of physical layer, storage layer, data layer, logicallayer, interface layer, network layer and application layer. And then, the author designs the logical layer, which is the core component of proposed architecutre, to be consistedof data preprocessor, data adapter, querying and reasoning analyser, querying andreasoning plan generator, MapReduce SPARQL querying engine, MapReduce SWRLrule reasoning engine and MapReduce Tebleau reasoning engine. The proposedframework can provide a completed architecture and data exchange workflow toimplement high-performance and scalable ontology data management cloud service inthe future, and it can establish the basis for the key technologies researching.(2) Based on the features of RDF triple and the formalized semantics of descriptionlogic-based OWL ontologies, this thesis proposes a novel data storage solution forlarge-scaled semantic web ontologies according to the HBase distributed databaseschema. The ontologies are designed to store in three HBase tables named T_OS_P,T_PO_S and T_SP_O, respectively. The MapReduce-based querying and reasoningapproach is analysed as well. Through comparing with the existing ontology storageschema, this thesis prove that the proposed schema can achieve the balance of the datastorage space and performance.(3) Based on the syntax and semantics of SPARQL and the features of MapReducekey-value pairs, this thesis proposes a novel MapReduce-based SPARQL GraphPatterns distributed querying approach for large-scaled RDF data. First, the authordefines several data models to represent RDF and SPARQL queries. Second, to reducethe number of MapReduce jobs and optimize the performance, a query plan generationalgorithm is proposed to determine jobs based on a greedy selection strategy.Furthermore, several query algorithms are also presented to answer SPARQL GraphPattern queries in MapReduce paradigm. An experiment on a simulation cloudcomputing environment shows that our approach is more scalable and efficient thantraditional approaches when storing and retrieving large volumes of RDF data.(4) Based on the semantics of OWL Lite ontologies and the Tableau algorithm ofdescription logic SHIF, this thesis proposes a novel MapReduce-based distributedTableau reasoning approach to check the consistency of large OWL ontologies. First, byexploiting MapReduce paradigm, OWL individual assertions are first partitioned intomultiple independent units with the form of key-value pair, and then the consistency ofeach unit with respect to the OWL terminologies is checked in parallel. Last, throughusing LUBM benchmark and comparing with Pellet, RacerPro and HermiT reasoners,an experiment on a simulation cloud computing environment shows that our approach ismore scalable and efficient than traditional tools when reasoning over large-scaled OWL ontologies.(5) Based on the syntax and semantics of SWRL rules, this thesis proposes a novelMapReduce-based SWRL distributed reasoning approach. First, some novel datamodels for representing SWRL rules and intermediate key-value data are defined.Second, a MapReduce paradigm based distributed SWRL reasoning algorithm isproposed under DL-safe restriction. Last, through using LUBM benchmark andself-defined SWRL rules, an experiment on a simulation environment shows ourapproach is more efficient and scalable than conventional rule engines Jess and Pelletwhen reasoning over large-scale of OWL data.
Keywords/Search Tags:Semantic Web, Cloud Computing, Ontology, Hadoop, MapReduce
PDF Full Text Request
Related items