Font Size: a A A

Research On Storage Strategies And Optimization Hadoop Platform

Posted on:2013-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:J C GaoFull Text:PDF
GTID:2248330371978014Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the economy, society, science and technology, the digital information is an explosively growth. Infomationization and the development of the Internet as well as the cheap storage equipment provide the power and physical basis for mass information storage. When the data quantity is little, it is easy to be stored and backup. However, as quantity rises to TB even PB level, the storage and backup become a difficult problem. Furthermore, people’s requirement for data storage efficiency and safety is more security. More and more people focus on how to store and read data efficiently, at present, cloud computing seems to be a better choice. It is an effective solution to enhance the data security and improve access speed. Hadoop is a more popular frame of cloud computing taking the advantage of high reliability, high efficiency, high scalability and high fault-tolerance. Besides, as a kind of open source frame, it is suitable for scientific research and application. So, we choose Hadoop framework as the object to do the research on clouds computing.Based on how to storage mass data efficiently, the paper analyzes the HDFS (Hadoop Distributed File System)principle and storage strategy of Hadoop with the experience of using Hadoop platform, finds the limitation and problems of data storage strategy in HDFS, in the end, gets to the optimum storage strategy for HDFS—That is DIFT. DIFT storage strategy is according to the state information of more perfect data node. It can enhance utilization rate of the cluster of disk and network’s bandwidth, reduce the possibility of bottleneck appearing, improve the system performance, and provide a better load balance for clusters and a more comfortable experience for the users.The content of the paper includes as follows:First, the principle of HDFS Hadoop model is introduced and analyzed including the research on controlling node, data nodes, data structure of file blocks and the relationship among interface, class, method. The operation principle and function of HDFS is also analyzed in this part. Second, the storage strategy of DIFT is designed based on the data structure, status information, and heartbeat agreement etc. Then, Hadoop code is compiled with DIFT storage strategy. Finally, DIFT strategy is applied to Hadoop cluster, and we can get to know the effect by the experiment. DIFT storage strategy has the characteristics of configuration, when make the design, we take the user’s actual situation into the consideration, so users can set strategy configuration according to their own demand. Experiments show that DIFT storage strategy promoted the storage efficiency for HDFS of Hadoop, realized mass data storage on the platform with highly efficient.HDFS can set up stable cloud computing platforms by using cheap machines, and has the efficient DIFT storage strategy, all that can well meet the needs of practical application. It can be used as data center platform in enterprises and universities. Furthermore, it can reduce develop period by equipping the right strategy and threshold value in storage strategy.
Keywords/Search Tags:Cloud computing, Hadoop, HDFS, Storage, Strategy, Datanode
PDF Full Text Request
Related items