Font Size: a A A

Support Vector Machine-based Data Mining Method

Posted on:2006-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:X D DuFull Text:PDF
GTID:2208360155466774Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
By using the statistical learning theory, support vector machine (SVM) is thought of a new generation of learning machine. The main advantage of SVM is that it can serve better in the processing of small-sample learning problems by the replacement of Experiential Risk Minimization by Structural Risk Minimization. Moreover, SVM can treat a nonlinear learning problem as a linear learning problem since it maps the original data to the kernel space in which we only solve the linear problems. The study of Support Vector Machine is becoming a new hotspot in the field of machine learning.The paper first reviews the principles of SVM, and then introduces some important and newest learning algorithms. For some problems of SVM, the main contribution of this paper lies in three aspects as below:1. Optimize the preprocess of SVMFirst, this paper introduces a method to decrease the training time of large scale Support Vector Machines and reduce the influence of outliners. Many Support Vector Machines algorithms normally start with a random subset. Then, a method is proposed that try to find a better than a random starting subset, and therefore accelerate the optimization process. It estimates which training vectors are likely to be support vectors in the high dimensional projected space of the SVM. The method is used to find outliners, and to eliminate the influence of the outliners.2. High Dimensional Center Support VectorSupport Vector Machine builds the optimal classification function on a small part of the training samples, the support vectors. Because all the information about classification only can be represented by the support vectors, SVM becomes sensitive to noises or outliners in the training set. For this reason, a new method, High Dimensional Center Support Vector (HCSVM),is presented, which takes the distances between centers of highly dimensional data as the original optimization problem by using the property of mapping nonlinearly separable data to linearly separable data with high dimension. Simulation resultsdemonstrate that the machines are less sensitive to noises or outliners and can deal with nonlinear problem. Incremental SVM LearningThe incremental data classification method has been one of the key technologies of intelligent knowledge discovering. However, its parameters are very difficult to determine, so we introduce a new method that use v -S VM to adjust the parameter automatically. The old support vectors have little influence to the result because the empirical error on the second batch drastically outweighs the error on the old SVs. To make up for the problem in the incremental learning algorithm, we make an error on the old support vectors (which represent the old learning set) more costly than an error on a new example. This method can improve degree of accuracy. Finally we present the original optimization problem, the Lagrange function and the dual Lagrange function.In the final, the summarization is presented and some problems to be solved and the prospect of support vectors machines are pointed out.
Keywords/Search Tags:Support vector machine, Pattern recognition, Incremental algorithm, Statistical learning theory, Neighborhood algorithm
PDF Full Text Request
Related items