Research Of Parameter Optimizational Distributed SVM Based On Hadoop Platform

Posted on:2017-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2348330536476763

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the high speed development of computer and internet technology,big data has become a hot social event already.Faced with big data,it is impossible for a computer to deal with it by it's own storage capacity and computing power.A variety of big data platform came into being.With many advantages,such as high reliability,high encapsulation and low cost,traditional distributed computing model have been replaced by big data platforms.Hadoop is ont of the most popular platforms.Traditional machine learning methods are challenged by big data hugely,they are suitable for standalone mode,such as Supoort Vector Machine.However the computing time increase exponentially with the size of the data.The training and predicting of traditional serial SVM are inefficient.One machine can harldly provide the enough computing power or memory space which can handle the massive amounts of data.This paper not only makes a deep discussion about the structure of Hadoop Distributed Filesystem and the programming framework of MapReduce,but also shows its advantages in contrast to the traditional method.For a classical type of cascaded iterative structure SVM(Cascade SVM),this paper analyzes the algorithm in the Hadoop platform theoretically and experimentally.Combining the experimental results with the characteristics of Hadoop platform,this paper implements an iterative structure SVM.In order to find global optimum support vectors,the training process is very slow.The approach of data training using MapReduce-based SVM classification algorithm is to combine the support vectors from subsets,and train them again.The efficiency and accuracy of the classification model is not ideal.This paper presents an improved method.By comparing the traditional Grid Search algorithm and the Particle Swarm Optimization(PSO)algorithm,this paper improves the standalone PSO.And on this basis,by analyzing the Planet Parallel PSO(PP-PSO)algorithm,combining the improved standalone PSO with the characteristics of Hadoop platform,this paper implements a new style of PP-PSO(NPP-PSO).Through the experimental results it can be seen that in the case of the default parameters,compared with standalone SVM algorithm,the distributed SVM algorithm could greatly improve the computing speed and guarantee the accuracy.Compared with the default parameters,after using the NPP-PSO parameters optimization for the distributed SVM,classification accuracy are improved obviously.

Keywords/Search Tags:

Machine Learning, Hadoop, MapReduce, Distributed Suppot Vector Machine, Distributed Particle Swarm Optimization

PDF Full Text Request

Related items

1	Modeling And Optimization For Combustion System Of The Boiler In Power Plants Based On Hadoop Big Data Platform
2	Research And Implementation Of A Distributed Vector Calculation Framework Based On MPI And MapReduce
3	Research Of Improved Particle Swarm Optimization Using MapReduce
4	Research On Distributed SVM Algorithm Based On Hadoop Platform
5	The Parameter Optimization Of Support Vector Machine Based On Improved Particle Swarm Optimization And Its Application
6	A Study On The Support Vector Machine Ensemble Learning Mehtod Based On Particle Swarm Optimization
7	Research And Application Of Machine Learning Method Based On Swarm Intelligence Optimization
8	Research On Unsupervised Clustering Algorithm And Support Vector Machine And Their Application
9	Study On Least Square Support Vector Machine Algorithms And Their Applications
10	The Particle Swarm Optimization And Research And Application Of The Support Vector Machine