Font Size: a A A

The Performance Analysis And Optimization Of Map Reduce Based On Hadoop

Posted on:2016-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:X H DaiFull Text:PDF
GTID:2308330473464460Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology, all kinds of information data are growing faster and faster. According to the requirement of processing the mass data, cloud computing has obtained the attention of the IT industry and gradually become the main pattern for calculation. MapReduce is a programming model of cloud computing and it provides software support for the calculation of huge amounts of data with its simple and practical interfaces which make data’s parallel processing easier.Iterative calculation on MapReduce is one of the points need to be optimized. Firstly, this thesis analyzes the defects of mainstream iteration framework currently, especially the following three aspects: abstract degree is not so high, task data and static data can’t be processed parallel, dynamic data cannot be completely separated and etc. Secondly, it improves MapReduce according to the above problem. It puts forward a corresponding strategy of parallel processing to solve the problem of long calculation time used serial processing by splitting the Map tasks and Reduce tasks and testing the terminating condition for parallel test. Through the improvements of storage strategy on Map, The static data stored in the Map, directly on the Map to finish the calculation process of static data and dynamic data,this paper also decreases the iteration time of whole process because of less MapReduce count in one iteration process.The method adopted by the traditional SVM sorting algorithms based on MapReduce of training data set is too simple and it just merges support vectors which from nodes’ training, so the efficiency and the accuracy of classifier are not very ideal. To solve the problem above, a improved training algorithm is proposed in this paper. Firstly, it uses the genetic algorithm to getting the optimal kernel function and parameters on each node at the same time, then using the combination to training the data set for support vector,and afterwards, combines all support vectors from training as a global support vector, then, merges every data subset with global support vector on each node to get a new training data set. Repeat these four steps until the global support vector no longer changes and that’s to say, it converges to the optimal classification model.Through programming and construction of the experimental platform, the optimized framework is verified that iterative calculation time has decreased significantly compared with the mainstream framework. PISVMAM is proved that being improved obviously than traditional classification.
Keywords/Search Tags:Cloud computing, MapReduce, Iterative calculation, SVM algorithm
PDF Full Text Request
Related items