The Performance Analysis And Optimization Of Map Reduce Based On Hadoop

Posted on:2016-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:X H Dai

Full Text:PDF

GTID:2308330473464460

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of network technology, all kinds of information data are growing faster and faster. According to the requirement of processing the mass data, cloud computing has obtained the attention of the IT industry and gradually become the main pattern for calculation. MapReduce is a programming model of cloud computing and it provides software support for the calculation of huge amounts of data with its simple and practical interfaces which make data’s parallel processing easier.Iterative calculation on MapReduce is one of the points need to be optimized. Firstly, this thesis analyzes the defects of mainstream iteration framework currently, especially the following three aspects: abstract degree is not so high, task data and static data can’t be processed parallel, dynamic data cannot be completely separated and etc. Secondly, it improves MapReduce according to the above problem. It puts forward a corresponding strategy of parallel processing to solve the problem of long calculation time used serial processing by splitting the Map tasks and Reduce tasks and testing the terminating condition for parallel test. Through the improvements of storage strategy on Map, The static data stored in the Map, directly on the Map to finish the calculation process of static data and dynamic data,this paper also decreases the iteration time of whole process because of less MapReduce count in one iteration process.The method adopted by the traditional SVM sorting algorithms based on MapReduce of training data set is too simple and it just merges support vectors which from nodes’ training, so the efficiency and the accuracy of classifier are not very ideal. To solve the problem above, a improved training algorithm is proposed in this paper. Firstly, it uses the genetic algorithm to getting the optimal kernel function and parameters on each node at the same time, then using the combination to training the data set for support vector,and afterwards, combines all support vectors from training as a global support vector, then, merges every data subset with global support vector on each node to get a new training data set. Repeat these four steps until the global support vector no longer changes and that’s to say, it converges to the optimal classification model.Through programming and construction of the experimental platform, the optimized framework is verified that iterative calculation time has decreased significantly compared with the mainstream framework. PISVMAM is proved that being improved obviously than traditional classification.

Keywords/Search Tags:

Cloud computing, MapReduce, Iterative calculation, SVM algorithm

PDF Full Text Request

Related items

1	The Performance Analysis And Optimization Of Map Reduce Based On Hadoop
2	Research On Iterative Computations For Big Data In The Cloud
3	Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments
4	The Research Of Task Scheduling Algorithm For Mapreduce Framework In Cloud Environment
5	Research And Improvement Of MapReduce Scheduling Mechanism On Cloud Computing
6	Cloud Computing And A Number Of Data Mining Algorithms Mapreduce Research
7	Research On Verifiable Computation Based On MapReduce In Cloud Computing
8	Performance Optimization And Applications Of MapReduce In Cloud Computing
9	Research On MapReduce Fair Scheduling Algorithm In Heterogeneous Cloud Computing Environment
10	Research And Improvement Of The MapReduce Framework In Cloud Computing