Font Size: a A A

Construction Method For Training Data Set In Classification Algorithm Of Support Vector Machines

Posted on:2010-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:X YuFull Text:PDF
GTID:2178360278466678Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Support vector machine (SVM) is a novel method on machine learning which was proposed by Vapnik and some other scholars. Based on the statistical learning theory and optimization theory, it is an implementation of structure risk minimization principle which belongs to the statistical learning theory. But in practical application, the data which needs to be treated is always massive or asymmetrical.So how to improve the treating ability of SVM on complicate data to make the application of SVM more wide has become a hot-point on research at present. For this question the researches included in the thesis can be summarized as follows:Firstly, for the question that training samples are asymmetrical and some classes of the training samples are very few for classification problem, a novel method to construct virtual samples which based on Gaussian distribution is introduced. This method based on the theory of Gaussian distribution and so on one side the correction of the virtual samples can be assured, and on the other side it can make full use of the prior knowledge and can be applied widely in many kinds of classification problems. Incorporate svm algorithm with this method and emulate an experiment on Iris data set and on kdd cup 99 data set. The result of the experiment shows that it can make full use of the prior knowledge and construct enough correct virtual samples, so the precision of the classification can be improved effectively.Secondly, for the question that svm algorithms run slowly for massive data treating, a novel method for pre-extracting support vectors based on improved vector projection is introduced on a deep investigation into the characteristic of support vectors. Under the linearly separable condition, this method uses the best vector projection which is obtained by fisher linear discriminant algorithm instead of the central vector. Under the nonlinearly separable condition, this method uses the true central vector instead of the approximate central vector in the feature space. By selecting a more rational vector projection, this method can use fewer margin vectors instead of the original vectors in the training process and make sure that the classification effect is good. It can reduce the training samples greatly and speed up the training rate. The results of the experiment also prove the validity and feasibility of this method.
Keywords/Search Tags:statistical learning theory, support vector machine, gaussian distribution, vector projection, virtual samples
PDF Full Text Request
Related items