Font Size: a A A

Research For Non-linear Support Vector Machine Classification Algorithm Based On MapReduce

Posted on:2015-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y MaFull Text:PDF
GTID:2298330431493883Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Support Vector Machine (SVM) algorithm is a robust and stable algorithm withhigh precision in the field of Data Mining and Machine Learning, which cansuccessfully deal with classification and regression problems. However, as acomputing-intensive algorithm, the non-linear SVM algorithm is limited to smallsamples statistical learning problem. In the face of the practical problems with hugeamounts of data, this paper aims to improving the precision of processing capacityand efficiency, simultaneously keeping the precision of the standard SVM algorithm.Based on the in-depth analysis of the standard non-linear SVM algorithm andMapReduce programming model, this paper has conducted the following researchwork and obtained the corresponding achievement.First of all, in order to improve the data processing ability and the efficiency ofthe serial SVM algorithm, a parallel SVM based on MapReduce (MR–SVM)algorithm is presented. In this case, A SVM classifier is obtained by splitting the largedataset, concurrently calculating the support vector set of each splits across map units,and then combining the partial support vector sets as the training set of the globaltraining in reduce phase. It is the global training that promotes the convergence of theMR–SVM automatically.Secondly, in order to make up the loss of accuracy introduced by distributedtraining of the MR-SVM algorithm, a parallel and iterative SVM based onMapReduce (MR-C-SVM) algorithm is presented. By introducing iterativecomputing, MR-C-SVM algorithm converges to the global optimal solution throughfeedback loop training. The amount of calculation in iterative processes is reduced byusing the KKT conditions filtering data subsets.Thirdly, in order to meet the requirements of online-learning and overcome thechallenge of cluster storage capacity being unable to meet the requirements, a parallelincremental iterative SVM based on MapReduce (MR-II-SVM) algorithm wasproposed on the basis of MR-SVM algorithm and MR-C-SVM algorithm. Theintroduction of incremental learning method makes MR-C-SVM can deal with continuously updated datasets by adopting the dynamic data allocation strategy ofHadoop and the iterative mechanism.Finally, it is theoretically proved that MR-C-SVM algorithm always converges tothe global optimums. The performance Indicators of pipeline of MR-II-SVMalgorithm was calculated. The Experimental results based on UCI standard datasetsshow that MR-C-SVM algorithm improves the data processing ability and efficiencyof the standard SVM algorithm while maintaining its high accuracy and exceeds theprecision of other parallel SVM algorithm based on MapReduce, and MR-II-SVMalgorithm has substantial advantages than other high precision algorithm in terms ofspeed ratio and efficiency.
Keywords/Search Tags:Support Vector Machine, MapReduce, parallel computing, iterativecomputing, convergence, incremental learning
PDF Full Text Request
Related items