Support Vector Machine(SVM)requires a lot of training time and memory space when dealing with large-scale data sets;using stochastic gradient descent iterative solution and applying it to a parallel environment can reduce the training time and memory space.This thesis takes this as an entry point to explore a better way to solve the problem,and the main work is the following two aspects:(1)Proposed to use Improved Weighted Linear Stochastic Gradient Descent(IWLSGD)to solve the Support Vector Machine(IWLSGD-SVM)from the original problem perspective.By introducing a penalty function,the constrained original problem of the support vector machine is converted into an unconstrained problem,the unconstrained form of the support vector machine is a convex optimization problem,which can be solved iteratively using an optimization algorithm.However,the traditional way to solve SVM requires a lot of matrix calculation,which is time-consuming.In order to improve the computational speed,this thesis designs the Linear Stochastic Gradient Descent(LSGD)to solve the support vector machine algorithm(LSGD-SVM).To improve the classification accuracy,the LSGD-SVM algorithm is improved by correcting the classification hyperplane using a weighting approach since most of the data distribution in the actual problem is not balanced.Among them,the weighting design considers the relative number of samples in both categories to avoid extremes and one-sidedness to a certain extent.The designed algorithm is applied to test data,and the experiments show that the stochastic gradient solving support vector machine is faster than the traditional solving method.The IWLSGD-SVM algorithm proposed in this thesis outperforms the LSGD-SVM in terms of classification accuracy and time performance.(2)Proposed parallel computing model based on Spark framework.First,a distributed platform is built on a single computer,configured with cluster files,and connected to external compilers.Then,in order to improve the computational efficiency,this thesis designs a batch synchronous parallel IWLSGD-SVM model.In the model,data parallel mode is adopted,and synchronous parallel mode is selected as the communication type.Finally,two groups of wind turbine data with different data levels are used to verify the effectiveness of the batch synchronization parallel algorithm.The experiment shows that the classification accuracy of this method is higher than that of the single machine mode in large-scale datasets,and the time consumption is shorter,achieving the effect of higher solution efficiency. |