Font Size: a A A

Study On High Availability And High Efficienceoptimization Of Mapreduce In Cloud Computing

Posted on:2016-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:M A ZhouFull Text:PDF
GTID:2308330461987826Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cloud computing has attracted more and more users because of its high fault tolerance mechanism, high reliability, low cost and some other advantages. Map Reduce as one of the key technologies of cloud computing, in just a few short years, it has become a mainstream parallel programming model and been used by many companies from different fields. But with the data volume impressive growth and new applications appear constantly, Map Reduce has exposed obvious disadvantages in many aspects, such as availability, scalability, and execution efficiency and so on. These disadvantages will even affect the operating efficiency of enterprises. Therefore, the optimization of Map Reduce has become a hot topic.Single Jobtracker node and scheduling algorithms largely simplify the system architecture and logic control process, but these bring about the following performance bottlenecks: one is that once Jobtracker failures, which could lead the entire cluster paralyzed. The second is scalability of the whole cluster could be limited by processing capacity of Jobtracker. The third is that the execution efficiency of Map Reduce is lower in a heterogeneous computer cluster. The existence of these bottlenecks will affect the system operation normally in an environment of big data. In order to solve these problems, the high availability and efficiency of Map Reduce have been optimized in this paper from the aspects of architecture and scheduling algorithm. The main works are as follows:⑴ The optimization of architecture. Single Jobtracker node in the original Mapreduce is replaced by a distributed Jobtracker cluster which has multiple Jobtracker nodes. Then based on this model, an all-to-all communication method is used to optimize the way of communication, and maintenance a workload list method is used to optimize the workload balance.⑵ The optimization of scheduling algorithm. After the optimization of architecture, in order to improve the efficiency of Map Reduce in heterogeneous cloud environment, this paper presented an adaptive scheduler based on workload type in heterogeneous environment(ASBT-HE). It feeds back results to Jobtracker to assign subsequent tasks to appropriate Task Trackers. But in order to meet the Qo S constraint and better balance the load of system. On the basis of ASBT-HE, the function of queue adjustment automatically is added. The algorithm is called better adaptive scheduler based on workload type in heterogeneous environment(BASBT-HE).⑶ Building a simple Hadoop cluster with the optimized architecture of Map Reduce. On this platform, test the high availability and efficiency of the optimized Map Reduce with the method of comparison.The experiment results show that on the platform of Hadoop with architecture optimized Map Reduce, the failure of Jobtracker nodes could not affect the System normal operation. BASBT-HE can make system has higher processing efficiency in heterogeneous cloud computing environment. So the optimized Map Reduce has high availability and high efficiency.
Keywords/Search Tags:cloud computing, MapReduce, architecture, scheduling algorithm, high availability optimization, high efficiency optimization
PDF Full Text Request
Related items