Font Size: a A A

The Design And Application Of Storage And Query Of Big Data Based On Cloud Platform

Posted on:2018-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:X L LuoFull Text:PDF
GTID:2348330518496340Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the amount of data is surging. Among them, most of data is heterogeneous. The successful application of Knowledge Graph in the field of information search promotes the research of fusion, storage and query of heterogeneous data.The ontology uses the unique identifier to mark the resources on the Internet, and can adds property for each resource or builds relations between different resources, which makes it flexible and expansible.With semantic web booming, recognized as an effective solution, ontology is widely used in heterogeneous data expression.In the field of computer, there are increasing number of researches of data management and application based on ontology. The traditional methods of information storage classify heterogeneous data into different tables according to its datatype, which lead to the results of information loss. With the increasement of network size and multi-source data,traditional databases and stand-alone environment are difficult to support storage and query of big data. Therefore, more and more researches based on cloud platform and distributed system are put forward to solve the problem. Although researches based on distributed system are not mature,they have great research significance and development prospect.Based on cloud platform Hadoop and non relational database HBase,this paper studies the integration, storage and query of massive heterogeneous data. The main work is as follows.1. Firstly, as the basis of subsequent distributed storage and query,the fusion of multi-source heterogeneous data is realized. Through parallel computing framework MapReduce, this paper realizes parallel ontology construction and fusion. In the process of construction,multi-source data are constructed into corresponding ontologies. In the peroid of fusion, various ontologies are fused into a semantically rich ontology.2.With the explosive growth of data, the bottleneck of traditional storage methods is increasingly prominent in two areas: data import and memory requirements. Referring to recently proposed distributed RDF data storage scheme, this paper proposes a storage model based on HBase in the consideration of occupied storage space and query response speed.3.Based on the HBase storage model mentioned above, we design the query strategy of triple pattern queries, basic graph pattern query and the keyword query. Triple pattern query is the foundation of basic graph pattern query, its response speed is decided by storage schema and database performance. In addition, by analyzing the structure of complex basic graph pattern query, an optimization method based on joint operation is proposed. The significance of keyword query is to improve the usability of query engine, it is achieved by taking advantage of the research results of basic graph pattern query. The effectiveness and efficiency of proposed strategies are verified by experiments on LUBM datasets.
Keywords/Search Tags:RDF, storage, query, HBase, cloud platform parallel computing
PDF Full Text Request
Related items