Font Size: a A A

Study On Support Vector Machine Based On Classification Nosie Detection

Posted on:2016-12-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y XiaFull Text:PDF
GTID:1108330503952370Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
SVM(Support vector machine) is an excellent machine learning method based on statistical learning theory. In the simplest linear separable problem, the maximum margin rule consistent with structural risk minimization theory is used to generate its basic convex programming model. This makes support vector machine have good generalizability. As the model is convex, it can obtain the global optimal solution. Based on these good performances, the program model of linear nonseparable problem is generated by introducing penalty coefficients and penalty factor. Kernel function theory is further used to deal with nonlinear problem so that the called “curse of dimensionality” can be avoided. The excellent performance of support vector machine makes it widely applied in pattern recognition, function approximation and density estimation, and becomes a research hotspot in machine learning.Towards the computation of relative density in high dimensional datasets and its combination with SVM based on self-aware technology in classification problem, around the training process of SVM and overfitting, aiming to speed up the training process of SVM, this thesis mainly covers the following topics’ research:① By analyzing the character of noise data in classification, the concept of “classification noise” is introduced, and the model of relative density for classification noise detection is proposed. Noise data in classification problem will weaken the degree of smoothness of decision curve and decrease the generalizability of decision function, which will lead to overfitting. Therefore, detection for those noise data is of great significance. To address the problem that existing algorithms cannot detect the noise data efficiently, based on that the density of classification noise in homogeneous samples is lower than the density of it in heterogeneous samples, the concept of classification noise is introduced and the model of relative density is further proposed to detect classification noises efficiently. The experimental results show that the model of relative density can detect classification noise effectively.② A nonseparable problem can be converted to be a separable problem by eliminating classification noises so that the model of support vector machine can be simplified. Combined with sequential minimal optimization, CNSMO(Classification Noises Detection based Sequential Minimal Optimization) is proposed. To address the problem that cross validation used in the existing SVM algorithms will increase the training time considerately, classification noises are eliminated in the proposed CNSMO algorithm and the decision curve becomes smooth so that the overfitting can be avoided. Therefore, although cross validation in the training process of CNSMO is not used, CNSMO can have a good performance in prediction accuracy. At the same time, eliminating classification noises converts nonseparable problem into separable problem so that it is not necessarily to optimize penalty parameter, which considerably simplify the iterative model of Lagrange parameters. The experimental results show that the training time of support vector machine can be reduced considerately under the condition that predication accuracy is not decreased. Furthermore, CNSMO has a good performance in stability.③ The measure to some fixed reference points is computed to measure the difference of location between different points, and the Euclidean distance between different points can be avoided computing indirectly. By the way, LDBA(Location Difference based Algorithm) is proposed. To address the problem that the existing neighbors searching algorithms used in the computation of relative density have low performance in high dimensional data, LDBA uses the angles and distances constituted with a sample and conference points to measure the difference between different samples, and the Euclidean distances between different samples are not needed to be computed indirectly. Therefore, LDBA has a low time complexity. It does not rely on any tree structure so that it can maintain good performance in high dimensional data. The experimental results show that LDBA has similar prediction accuracy with the basic algorithm and higher efficiency than other similar algorithms.④ By combining LDBA into CNSMO algorithm, LD-CNSMO(Location Difference and Classification Noise based Sequential Minimal Optimization) is proposed. To address the problem that CNSMO has a low performance in high dimensional data, LDBA is combined into the computation of relative density to detect and eliminate classification noises, and LD-CNSMO is proposed. As LDBA does not rely on tree structure, LD-CNSMO can maintain good efficiency in high dimensional data. The experimental results show that LD_CNSMO obviously has higher efficiency than CNSMO and other algorithms in high dimensional data.After introducing the concept of classification noise, this paper used LDBA based model of relative density to detect and eliminate classification noises. The model is further combined into SVM so that cross validation can be avoided in the training process of SVM. Under the condition that generalizability is not affected, training efficiency of the proposed algorithm in low and high dimensional data is improved considerably, and the stability is also strengthened. As a whole, the performance of SVM is effectively improved.
Keywords/Search Tags:support vector machine, kernel function, classification noise, relative density, location difference, nearest neighbors Searching, Sequential Minimal Optimization
PDF Full Text Request
Related items