Font Size: a A A

Optimization And Application Of SVM Algorithm Based On Hadoop Distributed Platform

Posted on:2013-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhangFull Text:PDF
GTID:2248330362963682Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The support vector machine is a very effective method of supervised machinelearning, is widely used in statistical classification and regression analysis. However,SVMis face with a big problem while doing the large-scale dataset training work,which makes SVM still occupy alarge memory and cost a long time. For these twoissues, this thesis presents a distributed SVM based on Hadoop platform, which notonly be able to effectively use the parallel computing of the cluster to solve theproblem of a large memory occupation andlong-time training, but also provides anability of highly reliable data storage and processing.By the distributed SVM based on Hadoop platform, large-scale training dataset israndomly split it into n subsets, each of whichis trained in parallel and a large numberof non-boundary samples are removed.The generated support vectorsare combined.Then, we divide the support vector set again into k pieces, train each piece and get theresult. Finally, we join all result into a global SVM, which is the model of distributedSVM. The distributed SVM is evaluated in an experimental environment showing thatthe algorithm reduces the training time significantly while maintaining a high level ofaccuracy in classifications.In this thesis, we extract HOG and LBP features on INRIAperson human datasetand get a large number of samples with number of9774and dimensions of9975. Werun distributed SVM on Hadoop platform with3machines in lab. The Experimentsshow that the training time of the distributed SVM is15of the standalone SVMwhile the accuracy rate is only a difference of0.1%.
Keywords/Search Tags:Distributed SVM, Hadoop, MapReduce, Human Detection, HOG-LBP
PDF Full Text Request
Related items