Support Vector Machine Learning Under Noisy And Overlapping Data

Posted on:2003-07-04

Degree:Master

Type:Thesis

Country:China

Candidate:S W Ji

Full Text:PDF

GTID:2168360062996407

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

process that can improve the performance of a system can be called learning. It is generally believed that learning is at the very core of the problem of biological and artificial intelligence. With the recent development of Internet and database technology, vast amount of complex, high-dimensional data emerge overwhelmingly. This creates the urgent need of constructing efficient nonlinear adaptive information processing algorithms for complex data analysis.Statistical learning theory derives necessary and sufficient conditions for consistency and fast rate of convergence of the Empirical Risk Minimization principle, which is the basis of most traditional learning algorithms. It also theoretically underpins the support vector algorithms. Support vector learning algorithm is based on Structural Risk Minimization principle. It combines two remarkable ideas: maximum margin classifiers with low capacity and therefore good generalization, and implicit feature spaces defined by kernel function. Theoretically underpinned by these ideas, SVM can overcome the curse-of-dimensionality and over-fitting problems elegantly. Due to its remarkable performance on many applications, SVM is receiving increasing attention now.Training SVM can be formulated into a quadratic programming problem. For large learning tasks with many training examples, off-the-shelf optimization techniques quickly become intractable in their memory and time requirements. Thus, many efficient techniques have been developed. These techniques divide the original problem into several smaller sub-problems. By solving these sub-problems iteratively, the original larger problem is solved. All the proposed methods suffer from the bottleneck of long training tune, especially when the data are noisy and overlapping.It is observed experimentally and algorithmically that when training data are noisy and overlapping, many support vectors have Lagrange multipliers on the upper bound. If it were known beforehand which examples are bound support vectors, these examples could be removed from the training set and their values are fixed at the upper bound. Due to the reduced free variable counts, this method is promising toimprove training time. According to the theoretical derivation of support vector machine training process, the training result is not compromised as long as all training examples satisfy the optimality condition before terminating. This thesis proposes a heuristic rule to decide which examples are bound support vectors based on their past behavioral statistics. If one example reaches the bound successively for some given times, it is regarded as bound support vectors and therefore removed from the training set. In the following training process, their values are fixed at the upper bound and do not included in the working set. In order to ensure the final solution is globally optimal, the modified training algorithm checks every bound support vectors against optimality condition before terminating. If there still exists some bound support vectors violating the optimality condition the training process is restarted until all training examples reach their optimal points. The original and modified algorithms are compared on many standard benchmarking datasets, including face detection, USPS, and MNIST handwritten digit recognition problems. Numerical results show that the modified algorithm out-performs the original algorithm with respect to training tune in most cases, especially when the training data are noisy and overlapping.

Keywords/Search Tags:

Statistical Learning Theory, Support Vector Machines, Complexity Regularization, Machine Learning, Pattern Recognition

PDF Full Text Request

Related items

1	Research For Support Vector Machines And Its Application In Pattern Recognition
2	Monetary Recognition Based On Support Vector Machines
3	Research On Support Vector Machines And Kernel Methods
4	Study Of Support Vector Machines Algorithm Based On Statistical Learning Theory
5	Studies Of Some Problems In Support Vector Machines And Semi-supervised Learning
6	A Margin-Adjusting Based Support Vector Machine (SVM) Model
7	Image Segmentation And Object Classification Based On Support Vector Machines
8	Research On The Learning Algorithm Of Support Vector Machines
9	Study Of Support Vector Machine And Its Application In Cancer Diagnoses
10	Triplet Support Vector Machines For Pattern Classification