Font Size: a A A

Research And Application Of Distributed Storage Technology Based On Semantic Metadata

Posted on:2017-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2348330503488911Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data, data storage and access users' desired information fast and accurately become more and more difficult. Semantic Web and the distributed platform Hadoop can effectively solve the difficulties of data storage and data acquisition, but along with the situation, it leads to the emergence of large-scale semantic metadata, which also makes the data management facing great challenge. So to build a practical distributed storage system of semantic metadata is becoming increasingly important to promote the analysis and application of big data.Paper first introduces the research background and status of the Semantic Web and the storage technology of RDF semantic metadata, it also discusses the importance and significance of the research, and on this basis, this paper expounds some related technologies, such as the semantic metadata, resource description framework(RDF), Hadoop and HBase and so on; Secondly, it analyzes the existing problems in the storage of RDF semantic data, and proposes a RDF semantic metadata storage strategy based on HBase. The storage strategy is mainly to propose an optimization method aims at the data storage method in Rowkey of HBase, which is fully integrated data loading, data deduplication and data query response and other factors, and does the operation that hash the predicate of the RDF data, then puts the predicate and the hash value in the Rowkey. Thirdly, the paper puts forward the data loading, data deduplication and data query algorithm based on the optimized RDF data storage strategy. The data loading algorithm is mainly completed by using HBase's own data loading tool. Data deduplication algorithm is using the fuzzy c-means clustering algorithm to realize the fuzzy clustering, and then can obtain the initial value of the clustering center by scanning the predicate table. And the data query method is to abstract the three components of RDF data separately, in the way of referencing the basic graph pattern query, it finds the related nodes and edges by judging the query conditions, on the basis of scoring nodes to sort the nodes, it can find out the best pre K value for the final result output; At last, the paper makes use of the most commonly used test data set LUBM in semantic web to experiment, the experiment is performed in a small cluster, through the analysis of the experimental results of each evaluation index, it is proved that the storage strategy and the proposed algorithm in this paper are feasible.
Keywords/Search Tags:big data, distributed storage, Semantic Web, RDF semantic metadata, HBase
PDF Full Text Request
Related items