Font Size: a A A

Research On Some Problems And Applications In Support Vector Machines

Posted on:2009-03-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:W L LiuFull Text:PDF
GTID:1118360272465565Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Statistical Learning Theory (SLT) provides a powerful theoretical basis for machine learning in studying small sample. By using the Structural Risk Minimization to integrate techniques of the statistical learning, machine learning and neural network etc and efficiently improve generalization ability of algorithm under Empirical Risk Minimization. Support Vector Machine (or SVM) is a new and very powerful machine learning method generated under such a theoretical system. SVM is of good potential applicability and development prospect, for it can solve well many practical problems that puzzle many existing learning methods, such as small sample, nonlinearity, over learning, high dimensional number and local minimal point etc. Currently, SLT and SVM, as the best theory for small sample learning, has receiving wide attention, and becoming a new research hotspot in machine learning and artificial intelligence. In this paper, we firstly illustrate existing algorithm and application researches on SVM. Secondly, we study in detail some existing problems concerned now, such as the unbalanced problems, de-sampling and de-noising problems, combination problems of two kinds of support vector algorithms SVM and Support Vector Domain Description(or SVDD), performances and applications of core vectors and application problem of SVDD algorithm to uncertain group decision etc. The main works in this paper include contents as follows:1. Study the adjustment method for unbalanced support vector machines. The learning of unbalanced data set is regarded as one of the open difficult problems in the area of machine learning, where the difficulty comes mainly from the feature of the unbalanced data itself. For instance, the class with few samples lacks samples, which can not reflect well the practical distribution of whole class. Therefore, the standard SVM often makes mistakes when separating the samples from the class with few samples in application to unbalanced data set. This results in the fact that the class with few samples has low precision though the whole classification precision is high. This paper proposes an adjustment algorithm for the separating hyperplane of two classes of unbalanced data in SVM. We use the information provided by sample projection distribution and sample size to determine the ratio of two classes of penalty factors and then obtain a new separating hyperplane. Experiment results show the good performances of the method.2. Study the de-sampling and de-noising problems. There exist two problems in using SVM to perform classifications as follows: One is the low classification precision due to the existence of outliers (or noises) in sample set; another is long training time for a large scale sample set due to requiring great memory space. For above problems, according to probability theory we analyze in location general proportions of outliers (or noises) and surplus samples, and propose a de-sampling method based on Euclid distance and kernel distance, respectively. The experiments show that the proposed method can keep or improve classification precision compared with SVM generally; for large sample, the method can not only keep precision, but also improve classification speed greatly, which is of strong practicality.3. A kind of classifier is presented by combining SVM and SVDD. Since all the samples participate in training for using SVM, it needs great memory space and long training time; while SVDC (Support Vector Domain Classifier) is low in classification precision though it needs a relatively little time in classification. To reduce training time of SVM and to speed SVDC, we build a new separating hyperplane, namely separating hyperplane based on SVDD. The algorithm considers classification information as a whole, and implements the combination of SVM and SVDD. Experiments show the efficiency of the method.4. Propose the concept of core vector, and apply core vector set to improve SVM. To extract sample information efficiently, we delete all the support vectors and find core vectors by choosing parameters based on SVDD. Linear kernel and radial basis kernel function are applied to describe sample data respectively to study the performance of core vector. It is proved theoretically that a core vector is of maximal density with respect to corresponding parameters in given sample set, hence we obtain the important conclusion that a core vector contains maximal information in the sample set. Therefore, any core vector can be evaluated the expectation point of a sample set, further more, core vector set can be trained to find control vector to improve SVM.5. Apply SVDD to uncertain group decision. Respectively Study the two kinds of inverse judgment problems of fuzzy judgment and interval judgment. For the fuzzy judgment, we choose fuzzy reciprocal judgment as the standard to determine expert weight according to the informational contribution by using SVDD to find common information. For the interval judgment, expert weights are determined in terms of the information contribution by using SVDD to extract group information, in which interval judgment matrices are decomposed as point vectors, and radial basis kernel function is applied. This research makes full use of the description performance of SVDD, holds main information, which is well suitable for uncertain group decision problems. This method not only enlarges the research area of SVDD, but also provides an efficiently technique for studying uncertain decision.
Keywords/Search Tags:Statistical learning theory, Support vector machines(SVM), Support vector domain description(SVDD), Kernel function, Unbalanced, De-sampling, outliers(noises), core vector, Inverse judgment, Group decision
PDF Full Text Request
Related items