Font Size: a A A

HDFS To Copy Data Storage Optimization And The Study Of Mass Data Storage

Posted on:2016-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:X B ChenFull Text:PDF
GTID:2308330464456837Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Social rapid development of the Internet to produce more and more data on the network, the value of these data are very important, how to store and manage these data is very important, it is also a very challenging task is the advent of the era of big data. The rapid growth of data using the model of the old manual record and use of relational database are far from it at the same time satisfy the large data storage and management in time, so how to store large data has already become very important and difficult, how to manage large data is the main content of this paper to study, and the search for the value of big data mining is the top priority, so big data has become a new challenge for modern society. In the Internet era of rapid development and widespread application, especially for some social networks, online e-commerce and mobile communications bring us to the mall to the PB shall be the unit for a semi-structured and unstructured information in a new era of big data, and life in this time every day there are hundreds of millions of data, also breeds in this era has a very big opportunity.This article mainly tells HDFS block of data to many copies of the defects in the store, in view of the shortage probability model was proposed to solve the shortage. The probability model is based on the mathematical point of view to solve the problem of multiple copy storage, predict copy available to calculate The Times of data copy, after the number of data blocks need to copy to create a copy of the data model, built the model after considering the load balancing problem is still need to further storage model. This consistency hash algorithm is used to place the data model, it can achieve load balancing. On the basis of the optimization we began to study of large-scale data storage solution. This article USES the database is a relational database HBase database to store, big data HBase database storage is the advantages of simple structure stored in the column is very convenient, is its own storage solution with the increase the amount of data will continue to trigger its split and compaction mechanism so that greatly reduces the storage performance, this paper puts forward the improvement plan is to combine HDFS to big data for storage, to store large data files in HDFS data indexes are stored in HBase.In order to verify the proposed two kinds of improved scheme, this experiment adopts the Hadoop framework for data storage, this experiment environment using the Linux system and 8 virtual machine. For the experimental results show that the model based on probability of HDFS replica placement strategy is superior to the system default strategy, placed at the 3 copies for storage has obvious promotion on time. For HBase improved storage strategy as the data collection of the data volume increasing improvement on storage efficiency also has obvious improvement.
Keywords/Search Tags:HBASE, Hadoop, Big data, HDFS, MapReduce
PDF Full Text Request
Related items