Research And Optimization On Semiparametric Support Vector Machine Under Spark Framework

Posted on:2020-11-30

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Wang

Full Text:PDF

GTID:2428330590959391

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The rapid development of big data technology has led to continuous improvement of analysis processing technology in massive data.Some machine learning algorithms that perform well on small samples are gradually applied to big data learning scenarios.The semi-parametric support vector machine is a computational model with the advantages of both parametric and nonparametric model,which can control the complexity of the classifier and has high training efficiency,but when it comes to big data,the computation time is relatively longer.In this paper,the Semi-parametric Support Vector Machine(S-SVM)algorithm is studied in the big data environment,and the Spark computing framework is used to realize parallelization research and improvement.In this paper,we researched a S-SVM algorithm which uses the Sparse Greedy Matrix Approximation(SGMA)algorithm as the predefined model and uses the Iterative Reweighted Least Squares(IRWLS)process to calculate weights.In order to solve the problem of long operation time in big data,two methods are proposed to iteratively optimize the computational efficiency of the algorithm:(1)The parallelization of semi-parametric support vector machine in Spark is proposed to improve the efficiency of S-SVM,which employs Spark RDD technology to share Memory,reducing the storage space of network transmission and the count of disk IO,and utilizes Cholesky matrix decomposition method to decompose computing tasks into a series of sub-tasks that can be executed in parallel.(2)In the basis of parallel S-SVM,the combination of kmeans and SGMA algorithm is proposed to construct the predefined model.The cluster centers of kmeans algorithm is used to solve the kernel matrix in SGMA algorithm,which increases the efficiency of the calculation by reducing the scale of the matrix and the calculated amount.Experiments show that the parallel S-SVM algorithm based on Spark has higher computational efficiency and almost the same classification performance compared with the original single-machine algorithm.And the improved parallel S-SVM compared with the original one has the same advantages in classification accuracy and the AUC,with shorter operation time.Moreover,the number of cluster centers which is the new parameter of the new algorithm has little influence on algorithm performance.Furthermore,compared with BPPGD,P-PackSVM and SVMwithSGD algorithm,it was proved that the final optimized algorithm has a comprehensive superiority in classification accuracy,AUC of classifier,the period of training and classification.

Keywords/Search Tags:

Spark, Semiparametric Support Vector Machine, kmeans, parallelization

PDF Full Text Request

Related items

1	Research On Some Problesm Of Support Vector Machine Learing Algorithm
2	Min-Max Modular SVM Based On Cloud Platform
3	Optimization And Application Of SVM Algorithm Based On Spark
4	Massive Text Classification Parallelization Technology Based On Support Vector Machine
5	Analysis And Research Of Machine Learning Model Based On Spark
6	Research On Robust Support Vector Machines
7	Researches On Some Problems In Nonparallel Hyperplanes Support Vector Machine And Feature Extraction
8	The Doubly Regularized Support Vector Machine With A Globally Linearly Convergent Algorithm
9	Research On Parallel SVM Algorithm Based On Spark
10	Research On The Classification Algorithm Of Unbalance Data Based On Spark