Font Size: a A A

The Hadoop Cloud Storage Strategy And Optimization

Posted on:2014-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuFull Text:PDF
GTID:2248330398957585Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of cloud computing technology,it has been widely accepted by the industry, the cloud storage system has also been a corresponding development, more and more enterprises and research institutions start using the cloud platform to build their own cloud storage system, and in numerous cloud platform, Hadoop cloud platform has been widely used by some enterprises, including Yahoo, Facebook and IBM. Hadoop data storage is mainly performed by the HDFS distributed sorting system, therefore the research of HDFS has become the basis of many companies build their cloud storage system.The default storage strategy of HDFS can effectively solve the big data storage but has some deficiencies in real applications. In the default storage strategy of HDFS, the Datanode status information is not perfect, and randomly selected data nodes. This can lead to when the Namenode select the data node for storage prone to the system unbalanced load. In addition, when randomly selected remote node for storage may cause the phenomenon which data transfer time is too long to loss of system performance because the network distance between nodes. In practical applications because the HDFS default storage policy using a fixed number of copies of data, This may cause the system data redundancy,will influence the load capacity of the system. Thus solving the above problems,the HDFS storage performance can be improved to some extent.In this paper, based in the analysis of the lack of HDFS default policy.and combined with the existing solutions then designed to optimize the HDFS default storage policy. First, the strategy carried out to perfect the status information of the data node, these informations provide more evidence for the control node to select the data node for data storage. Second, the policy can according to the actual needs of the user to set the data copy factor. Thirdly, in a randomly selected data node using the method of the evaluation value, this method can calculate the evaluation value of the node according to the load of the node and the network distance between the local node and this node.Finally, Deployment the optimized strategy in simulation platform to verify the feasibility of the strategy as well as testing the efficiency of the strategy. Experiments show that the optimization strategy can really improve the storage performance of system.The optimization strategy can effective balance the load between nodes, and reduces the possibility of system bottlenecks, and it also enhancing the user experience.HDFS distributed file system running on Hadoop cloud platform which build by cheap PCs.It also configed the optimized storage strategy so it can be very good to meet actual needs. Due to the above advantages,it can be used as a data center for enterprises as well as colleges and universities. Because the optimized strategy has high configurability, in practical applications users can according to different needs to configure the strategy to reduce the development cycle of the companies and universities.
Keywords/Search Tags:Cloud Computing, Cloud Storage, HDFS, Storage Strategy, Optimizing
PDF Full Text Request
Related items