Font Size: a A A

Research On Theory And Algorithm For Support Vector Machine Classifier

Posted on:2009-05-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:G S WangFull Text:PDF
GTID:1118360245970113Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Machine learning studies how computer simulates human learning behavior so that gains new knowledge or skills, reorganizes existing knowledge, and thereby improves abilities itself. It is one of major subjects in artificial intelligence and the fundamental way making computer intelligent.A system without learning ability is not qualified to intelligent system, however, systems built in the past usually lack of learning ability, for instance, their inference manner is confined to deduction, thus can at most prove existing facts and theorems, but can not discover new theorems, laws and rules. With the development of artificial intelligence in depth, these limitations are becoming increasingly evident.In the past decades, all kinds of machine learning methods have been proposed. By the relationship between given environment and desired target, they fall into three groups as follows.Induction: Input instances of a concept, we hope to obtain a description about the concept based on these instances, or improve existing description about the concept.Analogy: Input a description about new problem, searching for an analogical problem solved before, we hope to handle this new problem on the basis of experiences.Deduction: problem can be solved with existing knowledge-base, but the knowledge-base can not be utilized effectively, we hope to transform it into another.In fact, analogy can be regarded as composition of induction and deduction, so theunderlying learning strategies are only induction and deduction. Induction is a process of deriving general principles from particular facts or instances, the resulting knowledge is beyond existing knowledge-base, we called it knowledge-level learning; while deduction is a process of fidelity transform or specialization, despite of improvement of learning efficiency, the resulting knowledge is still implied in the existing knowledge-base, we call it symbol-level learning. From the viewpoint of implementation, inductive learning belongs with statistics-based method, while deduction learning with rule-based method.The advantages of rule-base method are simple and efficient, and new rules can be easily added. Because of exceptions, however, it is very difficult to keep rules consistent as the size of knowledge-base increases, furthermore the rule-based method depends seriously on domain knowledge, it is hard to transplant. In contrast, statistics-based method requires large amount of training data which should be easily obtained, and is usually expensive in computation, but it is independent of applied fields and transplantable easily.From the middle of 1960's to the middle of 1980's, since artificial intelligence research centered on symbolic system and knowledge-based methods, machine learning stressed rule-based methods. In the late 1980's, as the neural network growing up, statistics-based methods made rapid progress. Especially in the recent decade, due to popularization of internet and applications of computer in widespread fields, huge amount of data stream to us, how to discover the knowledge hidden behind them issues a challenge to machine learning, and in the meanwhile brings development opportunities to statistics-based methods. Among numerous statistics-based methods, support vector machine, for short SVM, is a rising star, and wins people's favors because of its solid theoretical foundation and excellent performance.Support vector machine arose in the middle of 1990's, and has become a focus of machine learning research in recent years. Since then, although just a period of about ten years, researchers have made great progress. SVM not only has solid theoretical foundation, but also wins universal commendation in practice, for example, it has been holding the best records in handwritten digits recognition and text categorization. This technique is good both in theory and practice, and incomparable to any classic machine learning methods, its potential are encouraging! For above-mentioned reasons, I choose support vector machine as the topic of my thesis.The theoretical foundation of support vector machine is statistical learning theory originated in the late 1960's, in subsequent twenty years, Vapnik, a former Soviet scientist, had made tremendous initiative and pioneering efforts for development of the theory presenting well-known "structure risk minimization principle" . Being completely theoretical, these achievements during this period did not arouse due attention. In 1995, Corts & Vapnik introduced support vector machine in their paper "support vector networks", which implements "structure risk minimization principle". Support vector machine has powerful generalization ability, its excellent performance shown in handwritten digits recognition, face detection, pedestrian recognition, text categorization, large-scale bioinformation processing and so on, received increasing attention. So research groups have been formed one after another in universities and companies at home and abroad, more and more people joint the research ranks, a hotspot of machine learning has taken shape. Basic research concerns training algorithm, model selection, multi-class problem etc., and applied research concerns 3D object recognition, nonlinear pattern reconstruction, intelligent signal processing, information retrieval, text categorization, data mining and so on. Nowadays, support vector learning becomes a mainstream technique of machine learning. Without exaggeration, just as information theory opens the way for information technology; statistical learning theory brings a deep change for machine learning.Support vector machine is essentially a nonlinear data processing method and different from neural networks, the former is based on structure risk minimization principle, while the latter on empirical risk minimization principle. This novel Structure risk minimization principle rests on firm mathematical foundations and brings us profound changes in understanding machine learning.Support vector machine is of the following characteristics:(1) Simple structure;(2) Convex optimization, no local minimum;(3) Sparse representation. Direction of the optimal separating super-plane is a linear combination of training samples; the coefficient corresponding to each sample reflects its importance. All information about classification is contained in support vectors whose coefficients are not zero. If removing non-support vectors or shifting them slightly, then re-training leads to the same solution as before, in other words, the solution only depends on support vectors.(4) Modularization. SVM is composed of two modules, a general purpose learning machine and a domain-specific kernel function, so we can design learning algorithm and kernel function in a modular way. This is great for theoretical analysis and engineering implementation.(5) It is essentially a linear function on feature space induced by kernel function, and thus easy to be studied in theory.Important ideas and methodologies behind support vector machine are given below:(1) Margin maximization. By means of optimal separating hyperplane, or called maximal margin hyperplane, structure risk minimization principle is implemented, and generalization ability is guaranteed.(2) Dual representation. In dual representation, training samples arise only in the form of dot product; thereby we can replace dot product with kernel function.(3) Kernel strategy. From linear classifier to nonlinear classifier, what we need is only substituting kernel function for dot product, original linear algorithm remains, all merits of linear classifier are inherited such as simple computation, no local minimum et al. we can not only manipulate the mapped samples in feature space through kernel function without substantial increase of computational cost, but also find a good solution to representing sophisticated functions. The dimension of feature space becomes less important after introducing kernel function even we needn't know the specific form of feature mapping avoiding "curse of dimension". We can construct a variety of classifiers by changing kernel function.At the beginning, support vector machine is designed for classification; its thoughts are extended to other scientific fields in the sequel, for example, regression, function approximation, density estimation as well as principal component analysis, nearest neighbor and Fisher discrimination. Kernel strategy has already developed into a methodology, many important data processing methods can be taken in this unified framework.This thesis only focuses on support vector machine classifier.As a developing technique, support vector machine is by no means flawless; there are many issues to be studied, for example,1. Training algorithmTraining a SVM amounts to solve a quadratic optimization problem, this problem is considered challenging because its Hessian matrix is completely dense, so the memory needed to store the problem grows with the square of the number of data points, therefore, training problems arising in some real applications with large data sets is impossible to load into memory, and can not be solved using standard non-linear constrained optimization algorithms, for instance, a training set of 50,000 examples will yield a matrix with 2.5 billion elements, which cannot fit into the memory of a standard computer. Therefore, people have been seeking for fast and memory-saving algorithms all the time. Researchers lay their emphasis on different aspects such as linear or non-linear SVM, on-line or off-line algorithm, precise or approximate algorithm and so on.2. Model selectionBy model selection we mean that how to select kernel function and associated parameters for a given problem. These parameters include: penalty coefficient C which trades-off training errors and generalization ability; parameters in kernel function such asσin Gaussian kernel and p in polynomial kernel. Different parameter corresponds todifferent feature space and feature mapping and closely relates to generalization ability. How to conduct model selection automatically?3. Knowledge embeddingBy knowledge we mean information except for training samples, for example, expert's experiences and domain knowledge. Standard SVM is based on training samples; it is very difficult to embed knowledge into SVM due to hidden feature mapping. However, experiences have taught us that amount and utilizing degree of knowledge reflect the level of a learning system, this is particularly important for solving specific problem. But how to embed knowledge into SVM has yet to be fundamentally resolved. 4. Multi-class problemSVM was designed for binary classification at the beginning; however, problems arising in practice are often multi-classification, How to extend it from binary classification to multi-classification? Multi-class problem usually involves large-scale data sets, how to train it effectively?My thesis will conduct investigations on above issues; the chief contributions are as follows,(1) Present a framework for statistical learning theory with information. In classical statistical learning theory, all important results are based on the assumption that training samples come from an unknown but fixed distribution or any distribution, which are two extreme cases. In general, we neither have a thorough grasp nor know nothing about problems we confront with, my framework can describe this situation (see chapter 2).(2) Dwell entirely on the development of support vector machine research on six subjects, namely, training algorithm, its variants, generalization ability, model selection, multi-class problem, and its applications (see chapter 3).(3) Present decision criterions and gave their mathematical proofs for three important types of kernel functions, that is, transition-invariant, rotation-invariant, and convolution kernel function (see chapter 4).(4) Analyze NPA algorithm proposed by Keerthi in detail, find out its shortcomings and make substantial improvements for iterative procedures in first and second type of test. The new version is stable and speeds up the convergence remarkably without increase of computational cost (see chapter 5).(5) Develop a classification simulation system based on our improved NPA algorithm (see chapter 6).This thesis consists of seven chapters, and is organized as follows. In chapter 1 we formulate what is support vector machine, summarize its characteristics and key ideas, and investigate the similarities and differences between it and traditional neural network. In chapter 2 we review concisely and precisely statistical learning theory, indicate its connection with support vector machine, and present a new framework for statistical learning theory with information. In chapter 3 we summarize overall and systematically the development of support vector machines on six subjects, namely, training algorithm, its variants, generalization, model selection, multi-class problem, and its applications. The key to enhance the performance of support vector machine is to design appropriate kernel function for specific problem, for this, it is essential to understand the properties of kernel function itself. In chapter 4 we discuss general properties of kernel function, especially for three types of kernel function, that is, transition-invariant, rotation-invariant, and convolution kernel function, we present decision criterions and give their mathematical proofs respectively with which we construct many kernel functions including a self-adaptive kernel function. The advantage of support vector machine is to handle nonlinear problems, but designing training algorithm for large-scale nonlinear support vector machine is more difficult than that for linear support vector machine, in chapter 5 we analyze NPA algorithm proposed by Keerthi in 2000 in detail, find out its shortcomings, and make substantial improvements for iterative procedures in first and second type of test, our new version is stable and speeds up the convergence remarkably without increase of computational cost. In chapter 6 we develop an auto-classification simulation system based on our improved NPA algorithm, the key module can be transplanted into practical systems. Finally, chapter 7 summarizes open problems and makes suggestions for further studies in the future.
Keywords/Search Tags:support vector machine, algorithm, kernel function, machine-learning
PDF Full Text Request
Related items