Font Size: a A A

Parallel Support Vector Machine Algorithm Based On MapReduce

Posted on:2024-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:X T WangFull Text:PDF
GTID:2568307124971479Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,big data has been widely used in fields such as the Internet,social networks and the Internet of Things.Compared with traditional data,big data has the characteristics of large scale,fast flow,diverse types and low value density,which makes traditional classification algorithm unsuitable for big data.As a classic supervised classification algorithm in machine learning,Support Vector Machine(SVM)has strong generalization ability,good robustness,and can effectively avoid over-learning problems.In the big data environment,the long training time and large memory usage of traditional SVM are the key factors that limit its development.With the development of parallel technology,parallel SVM has become an important research direction.The proposal and application of the MapReduce distributed computing model provides a good platform for SVM to process massive data.The research on parallel SVM algorithm based on the distributed computing model is receiving continuous attention from researchers.With the continuous increase of data scale and dimension,how to reduce the training time of the parallel SVM algorithm,how to improve the parallelization efficiency of the parallel SVM algorithm,and reduce the memory consumption of the algorithm have become urgent problems to be solved.For this,the main work of this paper is as follows:Aiming at the problems of noise data sensitivity,parameter selection difficulty and low parallelization efficiency in the parallel SVM algorithm in the big data environment,a parallel SVM algorithm(RBFO-PSVM)based on Relief and BFO was proposed.First,the algorithm designs the feature weight calculation strategy(MI-Relief)based on mutual information and Relief,which removes redundant features in the data set and reduces the interference of noisy data on parallel SVM;then,proposes a MapReduce-based MR-HBFO algorithm to select The optimal parameters of SVM;finally,a kernel clustering strategy(KCS)is proposed to reduce the size of the data set involved in parallel training,and a cross-fusion cascaded parallel support vector machine(CFCPSVM)model with improved cascaded SVM feedback mechanism is proposed,combined with MapReduce The framework trains SVM in parallel,which improves the training efficiency of parallel SVM.The experimental results show that the RBFO-PSVM algorithm has better performance in the big data environment.Aiming at the problems of unbalanced node load,excessive redundant data and high computing overhead in the parallel SVM algorithm in the big data environment,a parallel SVM algorithm(MR-FKSVM)based on fisher projection and clustering was proposed.The algorithm first proposes a K-means-based data set partition strategy(DSK),which evenly divides the data set into multiple subsets to balance the node load;secondly,proposes a redundant data pruning strategy(FS-FPD)based on fisher projection,which filters in advance Redundant data in the dataset reduces the size of the dataset involved in training;finally,an adaptive shutdown strategy(ASS-JC)based on the Jaccard coefficient is proposed to improve CSVM,remove the training layer of low utilization,reduce the computational overhead of parallel SVM,and The parallel SVM model is constructed based on MapReduce,which further improves the training efficiency of parallel SVM.The experimental results show that the MRFKSVM algorithm has a better classification effect on large-scale data sets and is more suitable for large-scale data environments.
Keywords/Search Tags:Big Data, SVM algorithm, MapReduce model, Fisher projection
PDF Full Text Request
Related items