Research Of Distributed Support Vector Machine (SVM) Based On Hadoop Cloud Platform

Posted on:2015-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:K Niu

Full Text:PDF

GTID:2268330428462822

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

SVM (Support Vector Machine, SVM) is a machine learning methodbased on statistical theory proposed by Vapnik et al. It is based on theVC dimension theory of statistical learning theory and minimumstructure risk principle and as a new classification method, shows agood performance in the treatment of the small sample, nonlinear,high dimensional datasets and pattern recognition problems.Therefore, this method gets more and more favour of the expertsand scholars from various fields and becomes a powerful tool to solvethe classification and regression problems in the data miningtechnology.However, with the gradually increasing size of the data sets, theSVM algorithm gets slowly in the process of training the dataset inorder to find the global optimal support vectors, and takes up a lot ofresources of computer, even can’t obtain the training model in theeffective time and the allowed practical conditions.The proposed of cloud computing has brought the dawn to the technology of massive data mining. With powerful storage capacity ofdistributed file system on cloud platform, while the parallelism of thetraditional data mining algorithms provides a good opportunity todevelopment of massive data mining technology.This paper explained in depth about the Distributed FileSystem(Hadoop Distributed File System, HDFS) and MapReducedistributed programming framework of Hadoop which is the mostpopular cloud platform and the inner workings mechanisms ofMapReduce computing framework, and built a fully distributedHadoop platform based on Hadoop-1.0.0in the Linux environment.Relying on HDFS, the Hadoop cloud platform achieves a sub-blockstorage to large data sets.Through reading the dfs.block.size property of the configurationfile named hdfs-site.xml, the data set was divided into blocksaccording to capacity in this paper, and then using a parallel SVMbased on MapReduce programming framework to train the allocatedblocks in the datanodes.The parameters in the training process of the traditional supportvector machine algorithm mainly depend on the experience. Thetype and parameters of the kernel function and the penalty factorwere optimized together with genetic algorithm in this paper.The experimental results showed that, compared with the traditional SVM algorithm relies on empirical values for parametersetting, the prediction accuracy using the genetic algorithm tooptimize the parameters of SVM algorithm has been more significantimprovement.A series of experiments analyzed the feasibility and performanceof the algorithm proposed in this paper on UCI standard data sets,through training time, prediction accuracy and other aspects.The results showed that compared with the traditional SVMalgorithm, the paralleled SVM algorithm gives a more obvious reduceto the complexity of the training time, with no significant decrease inthe prediction accuracy at the same time.Meanwhile, the paper using acceleration ratio to analyze therelationship between the number of nodes and the required trainingtime of the paralleled algorithm.The experimental results showes that accelerating ratio becomesfaster, which demonstrates a rising trend with the number of nodes inthe cluster increasing.

Keywords/Search Tags:

Hadoop, Massive Data Mining, Genetic Algorithm, Support Vector Machine(SVM)

PDF Full Text Request

Related items

1	Research And Application Of Distributed Support Vector Machine Based On Hadoop
2	The Research On Maintenance And Decision Support System For Electric Power Plant Equipments Based On Data Mining
3	Some Algorithms Research On Support Vector Machines
4	Properties Prediction Of ABC₂Semiconductor Using Data Mining And Development Of Data Mining Software
5	Research On Classification Algorithm Of Data Mining Based On Improved Support Vector Machine
6	Enabling Support Vector Machines To Work For Big Data
7	Support Vector Machine Algorithm For Data Mining Research
8	Research Of Multiattribute And Large-scale Data Classification Algorithm Based On Support Vector Machine
9	Research On Hadoop Based Fuzzy Support Vector Machine
10	Research Of Data Mining Techniques Based On Support Vector Machines