Font Size: a A A

High Availability Optimization Technology Studys On HDFS Metadata Management

Posted on:2017-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2348330503487193Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the lower storage cost of data storage devices, data storage is no longer the main factor of decising the system performance, but the availability of data storage system has become an important indicator to evaluating performance of the current system. This paper came from the perspective of system high availability, based on the current popular big data platform- Hadoop distributed file system, to study the high availability of metadata management optimization technique, in or der to improve the current availability of HDFS HA system scheme.This paper's research point is the system availability status issue when the HDFS metadata management node fails. Based on the HDFS Federation and HA system organization mode, the paper respectively elaborates the metadata after single node failure the system is in non-high-availability stage, and after double node failure the system takes a large amount of startup time when it starts a new cold backup node, and a centralized cache policy is without cache-replacement policy. Based on the above problems, this paper presents the optimization technology. The main idea of optimization technology is combining Active-Standby and Dual Actice Redundancy Mode, it combines two separate namespaces on the Federation. When a node fails, it selects the standby node under another namespace,and current node after failover,the two nodes are becoming a new highavailability combination.Meantime, when a pair of nodes failure under the same namespace, the standby node under another namespace could be succeed to becoming as the active node under fault namespace, that implements the hot backup mode to start a new node.Based on the above design optimization technology, this paper selects several technical to proposing detailed design and implementation. These are the optimized shared storage mechanism which based on the QJM, the optimized fault detection and switching technique which based on the ZKFC, and the optimized dynamic maintenance of data block mapping table.In addition, according to the metadata access request characteristic of concentration and sudden, this paper designs a new metadata cache replacement policy that can improve the system high avalibility. The new replacement policy set data-popularity as an replacement standard, it utilize the neural network prediction model, the multiple linear regression model and the attenuation function model to integrate a combined model, which is used to predict the data-popularity periodically.Finally, through the experiment tests to analyzing the comprehensive performance of the optimization techniques. According to the conclusion, optimization techniques could realize starting a new combination of high availability and hot-backup mode to start the new node function after the nodes failure. Meantime, the optimization techniques could ensure the data consistency, and other performances. In addition, the test use the Simple Scalar simulator to verify the new centralized cache replacement can sli ghtly improve metadata cachehit-ratio. In summary, the proposed optimization techniques could improve the system design requirement of high availability.
Keywords/Search Tags:HDFS, Metadata management, node failure, cache replacement policy, high availability
PDF Full Text Request
Related items