Font Size: a A A

Research Of Parameter Optimizational Distributed SVM Based On Hadoop Platform

Posted on:2017-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330536476763Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the high speed development of computer and internet technology,big data has become a hot social event already.Faced with big data,it is impossible for a computer to deal with it by it's own storage capacity and computing power.A variety of big data platform came into being.With many advantages,such as high reliability,high encapsulation and low cost,traditional distributed computing model have been replaced by big data platforms.Hadoop is ont of the most popular platforms.Traditional machine learning methods are challenged by big data hugely,they are suitable for standalone mode,such as Supoort Vector Machine.However the computing time increase exponentially with the size of the data.The training and predicting of traditional serial SVM are inefficient.One machine can harldly provide the enough computing power or memory space which can handle the massive amounts of data.This paper not only makes a deep discussion about the structure of Hadoop Distributed Filesystem and the programming framework of MapReduce,but also shows its advantages in contrast to the traditional method.For a classical type of cascaded iterative structure SVM(Cascade SVM),this paper analyzes the algorithm in the Hadoop platform theoretically and experimentally.Combining the experimental results with the characteristics of Hadoop platform,this paper implements an iterative structure SVM.In order to find global optimum support vectors,the training process is very slow.The approach of data training using MapReduce-based SVM classification algorithm is to combine the support vectors from subsets,and train them again.The efficiency and accuracy of the classification model is not ideal.This paper presents an improved method.By comparing the traditional Grid Search algorithm and the Particle Swarm Optimization(PSO)algorithm,this paper improves the standalone PSO.And on this basis,by analyzing the Planet Parallel PSO(PP-PSO)algorithm,combining the improved standalone PSO with the characteristics of Hadoop platform,this paper implements a new style of PP-PSO(NPP-PSO).Through the experimental results it can be seen that in the case of the default parameters,compared with standalone SVM algorithm,the distributed SVM algorithm could greatly improve the computing speed and guarantee the accuracy.Compared with the default parameters,after using the NPP-PSO parameters optimization for the distributed SVM,classification accuracy are improved obviously.
Keywords/Search Tags:Machine Learning, Hadoop, MapReduce, Distributed Suppot Vector Machine, Distributed Particle Swarm Optimization
PDF Full Text Request
Related items