Parallel Support Vector Machine Algorithm Based On MapReduce

Posted on:2024-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:X T Wang

Full Text:PDF

GTID:2568307124971479

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology,big data has been widely used in fields such as the Internet,social networks and the Internet of Things.Compared with traditional data,big data has the characteristics of large scale,fast flow,diverse types and low value density,which makes traditional classification algorithm unsuitable for big data.As a classic supervised classification algorithm in machine learning,Support Vector Machine(SVM)has strong generalization ability,good robustness,and can effectively avoid over-learning problems.In the big data environment,the long training time and large memory usage of traditional SVM are the key factors that limit its development.With the development of parallel technology,parallel SVM has become an important research direction.The proposal and application of the MapReduce distributed computing model provides a good platform for SVM to process massive data.The research on parallel SVM algorithm based on the distributed computing model is receiving continuous attention from researchers.With the continuous increase of data scale and dimension,how to reduce the training time of the parallel SVM algorithm,how to improve the parallelization efficiency of the parallel SVM algorithm,and reduce the memory consumption of the algorithm have become urgent problems to be solved.For this,the main work of this paper is as follows:Aiming at the problems of noise data sensitivity,parameter selection difficulty and low parallelization efficiency in the parallel SVM algorithm in the big data environment,a parallel SVM algorithm(RBFO-PSVM)based on Relief and BFO was proposed.First,the algorithm designs the feature weight calculation strategy(MI-Relief)based on mutual information and Relief,which removes redundant features in the data set and reduces the interference of noisy data on parallel SVM;then,proposes a MapReduce-based MR-HBFO algorithm to select The optimal parameters of SVM;finally,a kernel clustering strategy(KCS)is proposed to reduce the size of the data set involved in parallel training,and a cross-fusion cascaded parallel support vector machine(CFCPSVM)model with improved cascaded SVM feedback mechanism is proposed,combined with MapReduce The framework trains SVM in parallel,which improves the training efficiency of parallel SVM.The experimental results show that the RBFO-PSVM algorithm has better performance in the big data environment.Aiming at the problems of unbalanced node load,excessive redundant data and high computing overhead in the parallel SVM algorithm in the big data environment,a parallel SVM algorithm(MR-FKSVM)based on fisher projection and clustering was proposed.The algorithm first proposes a K-means-based data set partition strategy(DSK),which evenly divides the data set into multiple subsets to balance the node load;secondly,proposes a redundant data pruning strategy(FS-FPD)based on fisher projection,which filters in advance Redundant data in the dataset reduces the size of the dataset involved in training;finally,an adaptive shutdown strategy(ASS-JC)based on the Jaccard coefficient is proposed to improve CSVM,remove the training layer of low utilization,reduce the computational overhead of parallel SVM,and The parallel SVM model is constructed based on MapReduce,which further improves the training efficiency of parallel SVM.The experimental results show that the MRFKSVM algorithm has a better classification effect on large-scale data sets and is more suitable for large-scale data environments.

Keywords/Search Tags:

Big Data, SVM algorithm, MapReduce model, Fisher projection

PDF Full Text Request

Related items

1	Research On Parallelization Of Clustering Algorithm Based On MapReduce
2	Study On X-ray CT Reconstruction Algorithm For Insufficient Projection Data
3	Research Of Parallel Apriori Algorithm Based On MapReduce Model
4	Research Of Frequent Itemsets Mining Algorithm Based On MapReduce Calculation Model
5	Algorithm To Deal With The Problem Of Data Skew In MapReduce Model
6	Research And Application Of Epicentre Based Data Model Projection Tool
7	Research On SVM Classification Algorithm Merged With Fisher Discriminant Analysis
8	Research On Distributed Fast Clustering Algorithm Based On Mapreduce
9	Research On The Optimization Of Support Vector Machine Algorithm Based On Fisher Discriminant Analysis
10	Research And Application Of Clustering Mining Algorithm Oriented Big Data Based On MapReduce