Font Size: a A A

A Linear Separable Support Vector Machine For Large Samples

Posted on:2019-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y QiaoFull Text:PDF
GTID:2417330566976962Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the explosion of industry data,the concept of big data has been greatly improved.Due to the large amount of large data and complex and diverse features,traditional support vector machine classification algorithms are no longer applicable in big data environments.Therefore,The research of the SVM classification algorithm under big data has become the direction of close attention from all walks of life.In order to be able to apply the SVM to rapid classification of massive sample data,it is necessary to filter potential support vector sets from large sample data sets as a training set of SVM to improve learning efficiency.Because of the large sample size,the complexity of training SVM will increase dramatically and consume a large amount of training time,which makes it difficult for the SVM to be adopted in massive sample data learning.The separation hyperplane of the support vector machine is determined by the support vector,and the other training sample points have no effect on the determination of the separation hyperplane.This article will reduce large-scale data to small-scale data,learn support vectors on small-scale data and iterate to get the final support vector.This paper proposes a linear separable SVM grouping algorithm,this algorithm randomly divides large samples into several groups of small-sample training data sets.Training is performed on small-sample training data sets to obtain potential support vectors.The potential support vectors is added to the next group for training,and so on.The support vector obtained from the last group of training is the support vector of the large sample data set.Secondly,a misclassification sample preselection algorithm is proposed.The algorithm is based on the decisive role of the support vector for separating the hyperplanes.In a large number of the training sample data set to remove from separation hyperplane of sample points,and the suspect samples are extracted and trained with these suspect samples.Support vector machines not only use the useful information of all the samples,but also save the training time of the support vector machine and greatly improve the training efficiency.The experimental results show that the two algorithms proposed in this paper are exactly the same as the support vector obtained by convex quadratic programming,which reduces the learning difficulty and running time of the support vector machine and has real-time and high efficiency.
Keywords/Search Tags:Support vector machine, separation hyperplane, large sample, grouping algorithm, misclassification sample preselection algorithm
PDF Full Text Request
Related items