Font Size: a A A

Research On The Algorithm Of Parallel SVM In Cloud Computing Environment

Posted on:2015-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:L N GuoFull Text:PDF
GTID:2298330467464523Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification is an important issue in the field of machine learning research. Support vector machine (Support Vector Machine, SVM), as one of the mainstream classification methods, has been used widely in software module defect detection, image recognition and other fields, and has got a lot of attention from researchers. However, the classical serial SVM is mainly applied to small scale dataset. There exist shortcomings of inefficiency when it processes large scale dataset. Therefore, the key task is to design the parallel SVM algorithms which are applicable to mass data.Currently, the classical researches of parallel SVM focus on the data aspect. Namely, the parallel SVM is to train SVMs on subsets of the original dataset in parallel, and then merge them to get the final result of classification. Compared with the parallel SVM in the data aspect, very little work has been done on the parallel SVM in the algorithm aspect. And researches of considering the class distribution information in the SVM model are still rare. In this paper, we research deeply on parallelizing the algorithm of primal estimated sub-gradient solver for SVM (Pegasos) in MapReduce framework, and perform effective experimental testing on the software module datasets. The main tasks of the paper are as follows:1. Propose a parallel algorithm of primal estimated sub-gradient solver for SVM (PPegasos). PPegasos parallels the main steps of the Pegasos algorithm:the stochastic gradient descent steps and projection steps. And the parallel algorithm is implemented on Hadoop based on MapReduce. Experimental testing is conducted on the software module datasets CM1and PC1. Experimental results show that the PPegasos algorithm is effective, and can be applicable to solve the classification problem of large scale dataset.2. Propose a parallel algorithm of primal estimated sub-gradient solver for structural SVM (PSPegasos). PSPegasos embeds the structural information of the samples to the Pegasos algorithm, and is implemented in MapReduce framework. We consider three kinds of structural information of different granularity:overall, class, and clustering. Experimental testing is conducted on the software module datasets CM1and PC1. Experimental results show that the embedding of structural information makes the final classification hyperplane be more in line with the direction of data distribution and improves the prediction accuracy effectively. 3. Propose a parallel ensemble algorithm of primal estimated sub-gradient solver for structural SVM (EPSPegasos). EPSPegasos is designed for software module defect detection which is one of class imbalance problems. To some extent, undersampling based on clustering retains the distribution information of samples, overcomes the problem of the information loss caused by undersampling, and solves the problem that the size of the dataset is too large caused by oversampling. Based on MapReduce, several classifiers are trained, and then merged to obtain the final classification result. Experimental testing is conducted on the software module datasets CM1and PC1. Experiment results show that the algorithm can get better classification result than a single classifier.
Keywords/Search Tags:Cloud Computing, MapReduce, Support Vector Machine (SVM)
PDF Full Text Request
Related items