Font Size: a A A

Research On Key Technology Of Cloud Storage Based On Hdfs

Posted on:2016-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:R YiFull Text:PDF
GTID:2348330485499988Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cloud storage system is to store large data,its bottom layer uses a distributed file system as a storage platform.HDFS(Hadoop Distributed File Sysytem) is an open source cloud computing platform Hadoop proposed distributed file system, its design is simple and be widely used.However, with the exponential growth of data, in the availability, reliability, scalability and data access performance, it has been difficult to meet the growing demand for data storage.HDFS uses a single Metadata Server MDS to manage metadata information of the whole system, which is designed to be simple, but a single metadata server often becomes the bottleneck of the system performance, the number of metadata is limited, and the single server node can also bring a single point of failure. When it fails, the whole system will not work, affecting the availability of the system. Simultaneously, in current HDFS, the system will default to save three copies of the file to ensure data reliability, but the number of replicas is fixed, and the selection of the location of the replica may bring about the problem of load imbalance, which affects the system efficiency. In order to solve these problems, I has carried out the following research in this paper:Deeply researched on the system architecture design and working principle of HDFS.This paper proposes a cluster metadata server architecture, based on the architecture, proposes an improved Hash algorithm which consist of virtual nodes. And use subtree division and the improved consistent hash algorithm combining to divided metadata.To solve the problem of local overheating caused by data access, a dynamic load balancing algorithm based on improved Hash algorithm for virtual node migration is proposed, dynamic adjuest local overheating problem caused by the difference of data access. Experiments show that the algorithm has good effect on load balancing performance.To solve cloud data reliability, in view of the current file system, the number of replicas is fixed, the location choice of replica may bring about the problem of load imbanlance, a dynamic replica amount adjustment strategy is proposed, which can dynamically adjust the number of replicas with the change of user access heat and access request response time.In order to make the access more efficient, a replica placement algorithm based on transmission cost is proposed, which is select the best place of access efficient is most in placement a replica. Finally, the experiment proves the validity of the replica strategy.
Keywords/Search Tags:Cloud storage, Distributed file system, HDFS, Metadata, Load balancing, replica
PDF Full Text Request
Related items