HDFS To Copy Data Storage Optimization And The Study Of Mass Data Storage

Posted on:2016-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:X B Chen

Full Text:PDF

GTID:2308330464456837

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Social rapid development of the Internet to produce more and more data on the network, the value of these data are very important, how to store and manage these data is very important, it is also a very challenging task is the advent of the era of big data. The rapid growth of data using the model of the old manual record and use of relational database are far from it at the same time satisfy the large data storage and management in time, so how to store large data has already become very important and difficult, how to manage large data is the main content of this paper to study, and the search for the value of big data mining is the top priority, so big data has become a new challenge for modern society. In the Internet era of rapid development and widespread application, especially for some social networks, online e-commerce and mobile communications bring us to the mall to the PB shall be the unit for a semi-structured and unstructured information in a new era of big data, and life in this time every day there are hundreds of millions of data, also breeds in this era has a very big opportunity.This article mainly tells HDFS block of data to many copies of the defects in the store, in view of the shortage probability model was proposed to solve the shortage. The probability model is based on the mathematical point of view to solve the problem of multiple copy storage, predict copy available to calculate The Times of data copy, after the number of data blocks need to copy to create a copy of the data model, built the model after considering the load balancing problem is still need to further storage model. This consistency hash algorithm is used to place the data model, it can achieve load balancing. On the basis of the optimization we began to study of large-scale data storage solution. This article USES the database is a relational database HBase database to store, big data HBase database storage is the advantages of simple structure stored in the column is very convenient, is its own storage solution with the increase the amount of data will continue to trigger its split and compaction mechanism so that greatly reduces the storage performance, this paper puts forward the improvement plan is to combine HDFS to big data for storage, to store large data files in HDFS data indexes are stored in HBase.In order to verify the proposed two kinds of improved scheme, this experiment adopts the Hadoop framework for data storage, this experiment environment using the Linux system and 8 virtual machine. For the experimental results show that the model based on probability of HDFS replica placement strategy is superior to the system default strategy, placed at the 3 copies for storage has obvious promotion on time. For HBase improved storage strategy as the data collection of the data volume increasing improvement on storage efficiency also has obvious improvement.

Keywords/Search Tags:

HBASE, Hadoop, Big data, HDFS, MapReduce

PDF Full Text Request

Related items

1	The Design And Implementation Of Massive Data Storage And Calculation Platform Based On Hadoop
2	Research On Distributed Processing Of Massive Video Data Based On Hadoop
3	Design And Implementation Of Massive Log Data Quasi-Real-Time Query System Based On Hadoop
4	Optimization And Application Research Of MapReduce Computing Model Based On Hadoop
5	The Design And Implementation Of A CBIR System Based On Hadoop And Lucene
6	The Research Of Algorithm About Social Network Recommendation Service Based On Hadoop
7	Parallel Clustering Algorithm Based On MapReduce
8	Vehicle Routing Data Processing System Based On Hadoop And C4.5 Algorithm
9	The Performance Optimization And Improvement Of MapReduce In Hadoop
10	Design And Implementation Of The Data Analysis System Besed On Hadoop