Research And Application Of Distributed Storage Technology Based On Semantic Metadata

Posted on:2017-07-31

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Wang

Full Text:PDF

GTID:2348330503488911

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the arrival of the era of big data, data storage and access users’ desired information fast and accurately become more and more difficult. Semantic Web and the distributed platform Hadoop can effectively solve the difficulties of data storage and data acquisition, but along with the situation, it leads to the emergence of large-scale semantic metadata, which also makes the data management facing great challenge. So to build a practical distributed storage system of semantic metadata is becoming increasingly important to promote the analysis and application of big data.Paper first introduces the research background and status of the Semantic Web and the storage technology of RDF semantic metadata, it also discusses the importance and significance of the research, and on this basis, this paper expounds some related technologies, such as the semantic metadata, resource description framework(RDF), Hadoop and HBase and so on; Secondly, it analyzes the existing problems in the storage of RDF semantic data, and proposes a RDF semantic metadata storage strategy based on HBase. The storage strategy is mainly to propose an optimization method aims at the data storage method in Rowkey of HBase, which is fully integrated data loading, data deduplication and data query response and other factors, and does the operation that hash the predicate of the RDF data, then puts the predicate and the hash value in the Rowkey. Thirdly, the paper puts forward the data loading, data deduplication and data query algorithm based on the optimized RDF data storage strategy. The data loading algorithm is mainly completed by using HBase’s own data loading tool. Data deduplication algorithm is using the fuzzy c-means clustering algorithm to realize the fuzzy clustering, and then can obtain the initial value of the clustering center by scanning the predicate table. And the data query method is to abstract the three components of RDF data separately, in the way of referencing the basic graph pattern query, it finds the related nodes and edges by judging the query conditions, on the basis of scoring nodes to sort the nodes, it can find out the best pre K value for the final result output; At last, the paper makes use of the most commonly used test data set LUBM in semantic web to experiment, the experiment is performed in a small cluster, through the analysis of the experimental results of each evaluation index, it is proved that the storage strategy and the proposed algorithm in this paper are feasible.

Keywords/Search Tags:

big data, distributed storage, Semantic Web, RDF semantic metadata, HBase

PDF Full Text Request

Related items

1	Research And Implementation Of Storage And Query Techniques On Massive RDF Data
2	Research And Implementation Of Large Collections Of RDF Data Storage And Retrieval Technology On HBase
3	Research On Metadata Organization Approach For Image Storage Systems Towards Content-based Semantic Similarity Query
4	Research On RDF Data Storage And Query Based On HBase
5	Design And Implement Of The Semantic Sensor Data Management System Based On HBase
6	Research A Model Of The Metadata Hierarchical Storage In The Distributed Data Register Center Based On The DOA
7	Research On Metadata Management Tool Based On Semantic Analysis
8	Design And Implementation Of Metadata Management Tool Based On Semantic Analysis
9	Research Of File System Metadata Graph
10	Ontology Storage And Query Based On HBase