Font Size: a A A

The Inlfuence Of The Data Distribution Over Support Vector Machines

Posted on:2013-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhuFull Text:PDF
GTID:2218330362966830Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
For the problem of a large scale data set, it requires to select a subset of thetraining set. This requires us to study the distribution of samples,find theimportant samples. This paper focuses on the Support Vector Machines, tries tofind the sample could become support vector in the training set. This researchcan relieve the pressure of memeory and training time of solving quadraticprogramming. Compared with the existing sample selection algorithm,somemethods proposed in this research can be applied to both two-class(multi-class)classification and one-class classification. For incremental learning,it can onlyreserve the historical sample, which could transform into the support vector, toavoid retraining both historical samples and additional samples.The works of this thesis can be summarized as follows:1.The sample may become support vector always locates in the overlapregion. Some algorithms try to find this region before SVM learning. But thisoverlap region for some data sets may not exist. According to the neighbors1distribution properties, sample-neighbor angle is proposed. When summing thecosine sample-neighbor angles, it can be found that the consine sum of thesample locating near the boundary of the data distribution is close to k(k is thenumber of the nearest neighbors). Comparision with the methods usingclustering algorithm, it does not depend the performance of clusteringalgorithms. Comparison with NPPS algorithm proposed by Shin&Cho and CBDalgorithm proposed by Navneet Panda,it does not need the assumption that thetraining set has an overlap region and it can be applied to one-classclassification problem.2.During the incremental learning algorithm for support vector machine,itrequires to find the historical sample that could change into the support vectorin order to avoid retraining both historical samples and additional samples. Thispaper introduces the angle,which is between the subtraction new sample fromhistorical samples and the historical separation plane, during the incrementallearning. The smaller the angle is, the more likely the sample transforms intosupport vector. Comparision with the algorithm proposed by Syed,it has betteraccuracy. The speed is faster than the algorithm proposed by Liva Ralaivola&Florence. The angle between the subtraction new sample from historical samples and the historical separation plane is equivalent to the distance between thehistorical sample and the original classification plane. The smaller distance is,the more likely the sample transforms into support vector.3According to slack variable, the support vector can be classified intolinear separable support vector, generalized linear separable support vector andnon-linear separable support vector. Linear separable support vector containsmore information. Only preserving a portion of support vectors(generalizedlinear separable support vectors) during incremental learning,the accuracy canbe close to the algorithm proposed by Syedfs and the speed is faster than its.
Keywords/Search Tags:Support Vector Machines, Sample Selection, Cosine Sum, one-Class SVM, Incremental Learning, KTT conditions
PDF Full Text Request
Related items