The Study Of Incremental Learning Algorithm Of Support Vector Machine And Its Application In Intrusion Detection

Posted on:2009-01-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2178360242980266

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Support Vector Machine is a general machine learning method, which proposed by Vapnik's research team. It takes VC dimension and Structural Risk Minimization theory as the foundation and changes the dimension reduction of traditional thought to dimension rising. It has a good generalization performance.With the development of information age, the data need to be processed are getting more and more huge. Usually it's very difficult to obtain a complete training set in the beginning. Therefore, people hope that the learning machine can continuously improve its learning precision along with the accumulation of data samples in application process. It's the thought of incremental learning.Comparing with the entire training set, the number of support vectors is very small. Therefore, Support Vector Machine is a powerful tool for incremental learning. It's very suitable to process large-scale data set. Generally speaking, incremental learning is applied in two aspects. The first application is in large database such as Web log. Secondly it is used to deal with streaming data. These data change continuously such as stock transaction data. Moreover, the realization of existing incremental learning algorithms mostly adopts decision tree and neural network algorithms. They have the following two shortcomings. On the one hand, because they lack the control of expected risk to the whole sample set, these algorithms match excessively easily to the training data. On the other hand, because they lack the elimination mechanism through selection or contest to training set,they have affected the precision of classify in a great deal. SVM which based on Structural Risk Minimization theory is one of the learning algorithms in minority that can successfully solve the first problem. But the classical SVM learning algorithms don't support incremental learning directly.There are many incremental learning algorithms based on SVM. Comparing with the traditional learning method, they can use the historical learning result fully, thus reduce subsequent training time. Do not need to preserve the historical data, thus reduce the storage space. In incremental learning, as a result of the increase of training set, the equivalent relation between support vector set and the entire training set has broken, thus it's necessary to seek for new SV set. How to train quicker using the historical training result, how to abandon the useless historical sample points without losing precision and so on are the important problems which need to be considered in SVM incremental learning algorithm.Intrusion Detection System has received more and more attention along with the popularization of network application. Anomaly detection technology of machine learning detects known even unknown attacks through learning samples and analyzing anomalies. It becomes the mainstream of research gradually. Machine learning can be divided into batch learning and incremental learning. The former determines each parameter of the learning machine on average error situation through learning enough samples. The latter can learn new samples real-time based on the old samples and establish learning machine's structure and parameters dynamically. So incremental learning is more appropriate for Intrusion Detection.The research content of this paper is incremental learning algorithm of Support Vector Machine and its application in Intrusion Detection. By researching Statistical Learning Theory and Support Vector Machine, this paper discusses the thought of incremental learning based on Support Vector Machine and analyzes the incremental learning process. It proposes a KKT cross-validation SVM incremental learning algorithm and applies it to Intrusion Detection. Through experiment, related algorithms'performance has been confirmed and compared. The work that has been done in this paper is as following1) It introduces machine learning and Statistical Learning Theory, including Experience Risk Minimization, VC dimension theory, Structural Risk Minimization principle and so on.2) It introduces the main idea of Support Vector Machine and discusses the standard algorithm of SVM, including linear and nonlinear Support Vector Machine.3) It presents the thought of incremental learning and introduces some existing incremental learning algorithms of Support Vector Machine, especially expatiates the simple incremental learning algorithm in detail. 4) Based on the analysis of incremental learning process, a new algorithm called KKT cross-validation algorithm for SVM-based incremental learning is proposed. This algorithm adopts the mind of the equivalence between the original training set and the newly added training set. It considers the training samples which possibly become new SV after incremental learning. These samples are equivalent to the samples that violate the generalized KKT condition and the samples which satisfy the generalized KKT condition but near to the original classify margin. Thus the useless samples are discarded and the useful training samples having important information are reserved. It enables the result of incremental learning to reflect the changes of training set accurately. Further, through discussing samples'geometric distribution characteristic, the above algorithm has been improved. In the improved algorithm, the new training set contains the added samples which are far from the center of the samples except the samples that the former algorithm has mentioned. Thus more samples that contain important classify information are kept.5) It applies the incremental learning algorithm of SVM to Intrusion Detection. Firstly, it analyzes the feasibility and proposes an intrusion detection model based on Support Vector Machine. This model can train the classifier of SVM using training set and update the classifier using new training data on the premise of using the original classify information fully. It may raise the efficiency of SVM classifier greatly.6) Finally, through experiment on intrusion detection data set KDD CUP1999, it illuminates the performance of the algorithms. Comparing with the traditional incremental learning algorithm of Support Vector Machine, the algorithms of this paper make a good progress in training time, precision, rate of false positive and so on. It also analyzes the impact of various parameters and the threshold value to the algorithms'performance through experiment.The area that intrusion detection technology has mentioned is widespread. The research work of this paper is only a small part. Support Vector Machine which has unique superiority on learning and pattern recognition under small amount of samples has widespread prospect in application in Intrusion Detection. Incremental learning algorithm has made up the weakness of lacking transcendental knowledge. It strengthens the generalization ability and detection efficiency to the system. The further work may focus on kernel function, incremental learning algorithm and the optimization of training characteristic, simultaneously avoiding excessive learning and how to increase detection precision.

Keywords/Search Tags:

Incremental

PDF Full Text Request

Related items

1	Research On Key Techniques Of Incremental Object Tracking
2	Study Of Rough Set Theory Based Incremental Algorithms And Its Application
3	Research On Techniques Of Incremental Processing For Big Data Based On Hadoop Platform
4	The Design & Implementation Of TTCN-3 Language Incremental Compiler
5	File-match Based Light-weight Differencial Incremental Bakcup System
6	Research And Application On The Incremental Bayesian Classifier And Incremental Rough Set Algorithm
7	Study of branch and bound for incremental SAT
8	Incremental learning with large datasets
9	Incremental kernel learning algorithms and applications
10	Optimisation dynamique pour l'apprentissage incremental adaptatif des systemes de classification