Font Size: a A A

Research On Performance Optimization Based On MapReduce

Posted on:2016-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:2208330461485952Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, the data handling capacity of the surge in demand, emerge as the times require distributed data processing technology. And Hadoop as the infrastructure of open source distributed system under the Apache foundation, to its expansibility, low cost, reliable and efficient advantages after publication has been widely used in each big Internet companies in various fields. With the popularity of the Internet and customer demand increasing, the amount of data processing has increased, the power problem of large-scale cluster processing task speed and cluster will face more severe challenges.Hadoop consists of two core components: HDFS(Hadoop Distubted File System) and Map Reduce. Are distributed file system and distributed parallel computing framework. For the optimization of HDFS is mainly on the adaptive adjustment of read and write mode and Strategies of traditional technology, but the research to optimize the performance of Map Reduce is mainly reflected in three aspects: the application configuration optimization, parameter optimization and optimal scheduling algorithm.This paper focuses on the problem of dynamic problem for heterogeneous case of node load imbalance and for dynamic load coordination problem, so this paper will be the cluster monitoring module and the cluster scheduling module migration to thereby reduce the conception of master node load from the node, and gives the design flow chart. In a decision cluster load index problem, for some of the traditional parameters are analyzed, think to determine the node load condition exists using a single parameter is not small defects, and proposed by the current node the maximum number of nodes of parallel tasks, queue length, and CPU utilization rate of node parameters to calculate a load condition a time stamp cluster may appear. And then makes use of forecasting load status to infer the load status of nodes a timestamp node, and then according to the prediction of the load to make the adjustment strategy.Then in the experimental part of this paper use the Cloud Sim software to the actual Hadoop cluster simulation. The Hadoop default scheduling mode and optimized scheduling scheme of the experimental results are compared and analyzed, to optimize the performance of the proposed verification department, really can optimize the clustering performance, but the optimization scheme indeed exist to make the CPU utilization rate fluctuation is problem.
Keywords/Search Tags:Cluster, MapReduce, Load forecasting
PDF Full Text Request
Related items