Design And Implementation Of Parallel SVM Algorithm For Large Scale Text Data

Posted on:2014-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhao

Full Text:PDF

GTID:2298330422969049

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Support vector machine (Support Vector Machine, SVM) is aclassification algorithm based on statistical learning theory in datamining. Because of its less over-fitting, for characteristics caused byexcessive advantages of dimension disaster is not obvious and widely usedin the field of text classification, image recognition, patternrecognition. SVM classification training for massive data, slow trainingspeed, the training result and training model cannot be obtained in a veryshort period of time, the SVM algorithm can not be applied to large-scaledata processing. Therefore, in this paper, computing technology from theimproved SVM algorithm two times planning and application of distributedcomputing performance, to improve the SVM training to adapt to the massivedata size.First of all, this paper use the method of feasible directions forSVM two programming, the computation of higher performance is more simpleand new method. The method, by using the "coefficient adaptation method",instead of the original two programming method; at the same time, the newmethod in the original method to determine the process step coefficient,was reduced to a solving steps of quadratic equation with one unknown.Through the improvement of the two, the new method simplifies theoperation steps, reduces the computation complexity level.Secondly, the Hadoop parallel computing framework based on MapReducemodel, using the parallel SVM algorithm of the new, and the use ofdistributed storage scheme of HBase to storage data and calculationresults. To implement the SVM training process by combining theapplication of parallel computing and distributed storage technology,greatly enhance the ability of SVM high performance processing mass data. Finally, based on the above two improved, the realization of alarge-scale data set of text classification system. In a distributedcluster of8ordinary on the use of PC, and the same data size, made withrespect to improve performance of SVM serial training process of4-5times.Fully proved that the parallel SVM training, performance inclassification, classification speed, data processing on the advantagesof scale.

Keywords/Search Tags:

Support Vector Machine, Quadratic Programming, FeasibleDirection Method, Text Classification, Hadoop, HBase

PDF Full Text Request

Related items

1	An Alternative Multiplicative Updates Algorithm For Quadratic Programming In Support Vector Machines
2	Support Vector Machine Algorithm And Its Application To Intrusion Detection
3	Multi-classification Algorithm Based On Nonparallel Support Vector Machine
4	Research On Text Classification Of Mixed-kernel Parallel Support Vector Machine Based On Hadoop
5	Support Vector Machine Classification And Face Detection Applications
6	Research Onaugmented Lagrangian Method For Support Vector Machine
7	Research On Text Classification Method Based On Support Vector Machine
8	Research And Implementation Of Chinese Text Classification Based On Hadoop And SVM Algorithm
9	Research On Support Vector Machine Algorithm For Binary Classification Problem
10	Research And Implement Of Chinese Text Categorization Algorithm Based On SVM