Font Size: a A A

The Research On The Optimization Scheme Of Replica Distribution Strategy For Hadoop Cloud Storage Platform

Posted on:2019-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:X L ShenFull Text:PDF
GTID:2568305615450814Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology,the cloud computing technology has gradually become the focus of attention.With the increasing degree of information,the data generated by cloud computing has increased exponentially.In order to save huge amounts of data,cloud storage technology has gradually developed,and now it has become a technology that can not be acquired in many fields.The essence of cloud storage is to store user data on servers.These servers are usually built into clusters through the Internet.Users can access them remotely through the network,and users’ data is stored in the cloud without worrying about losing.Cloud storage technology largely depends on the Hadoop platform.Hadoop,a distributed platform developed by Apache,has great advantages in cloud computing and distributed storage.Many enterprises have done a lot of research on Hadoop application to cloud storage.HDFS is a distributed file system on the Hadoop platform.It has strong data storage and expansion capabilities.However,due to the diversity of business and the difference of data formats,HDFS has some defects in design,so it needs to be optimized to apply to specific scenarios.The main purpose of this paper is to solve the problem of unbalanced distribution of HDFS file system in replica.The main work includes two aspects.The first is the implementation of cloud storage system based on HDFS.The file copy is stored on different Data Node randomly.However,the distribution of replica does not follow the principle of load balancing in the process of random storage.This paper proposes a weighted evaluation index selection algorithm based on optimal solution,taking into account the use of each node of the load and the ratio of space,the data stored in the most appropriate Data Node,to achieve load balancing;second is the development of Web program,build the cloud storage system based on Hadoop,then the HDFS file system improvement with the original file system comparison,test,the distribution characteristics of the improved preservation test of HDFS copies of the document has better.The system designed in this paper is based on HDFS,which improves the distribution characteristics of the distributed file system when storing files,realizes the load balancing of replica distribution,and has a positive significance for improving the utilization rate of the system and the security of the files.
Keywords/Search Tags:Hadoop, Cloud storage, HDFS, Load balancing
PDF Full Text Request
Related items