Font Size: a A A

Study Of Support Vector Machines And Its Application In Intrusion Detection Systems

Posted on:2005-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:C X DongFull Text:PDF
GTID:1118360152971395Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Along with the wide application of computer and Internet, the security problem about systems or network becomes more and more important. Intrusion Detection Systems (IDS) are complementarities to security mechanism of network and computer systems, and enhance the protection depth of security. The research of IDS began in 80's of last century, although there is quite great progress, there is no effective mean to protecting the network and computer systems yet. The key problem of IDS is that there are too many data need to be dealt with, and these data are not sufficient and self-contained to the requirement of detection and false alarm rates.Support Vector Machines (SVM) is a novel algorithm of machine learning issued in 90's of last century, which is based on the statistic learning theory. It puts more attention to the information provided by the samples, and fits to solving the small sample problem.The aim of the dissertation is to research the applications of SVM to IDS. The dissertation contains four parts: the improvement of SVM algorithm, the estimation algorithm of the generalization performance of SVM, the method for selecting the parameters of SVM and the IDS based on the SVM.For the improvement of SVM, the tight mutual pair method for pre-extracting support vectors and step method for training SVM are proposed. Since the training procedure of SVM searches the optimal solution of a quadratic program on all the training samples, it requires huge computation and memory space. In fact, only few samples determine the performance of SVM, which are named support vectors. There are many similar characteristics between tight mutual pair in traditional pattern recognition and support vector. Generalized the tight mutual pair to the feature space by Mercer theorem, it can pre-extract support vectors efficiently, and the performance of the SVM trained on the tight mutual pair set is comparable with that trained on all the training samples, while the required computation and memory space of the former decrease sharply. By the KKT condition of solution procedure, samples in training set can be divided into three categories. For the testing samples, there are also three categories divided by the determination value, and the categories corresponds the position of samples to the classification hyper-plane and the angle to the normal vector of the hyper-plane. The effect of different category samples appended on the hyper-plane can be studied, it leads step method for training SVM. The method can not only make the SVM to relearn from the new samples, but also make classification hyper-plane change in an expect direction according with the certain problem.In the estimation of generalization performance, a linear program procedure forSpan method is presented. In the quadratic program solution procedure of Span, the objective function defines an elliptic norm of vectors. It is proved that the function and the constraint condition define a convex program. According with the convergence of convex program and equivalence of norm, it is also proved that the elliptic norm can be substituted by the infinite norm, the new program is still convex, and both optimal procedures are convergent mutually. After certain transformation, a linear program method can be obtained to solve the Span. The error between two methods is very little, and the computation decreases sharply.In the selection of the parameters, optimal and trials-and-errors methods are proposed. According with the affections of support vector to the classification performance, they can be categorized into sensitive and insensitive parameters. So the parameters are updated with different rules. For kernel parameters, the ratio of the number of support vectors to that of training samples is used to measure the variation of generalization performance, and updates the parameters indirectly. For penalty factor, the ratio of the number of bound support vectors to that of support vectors is not the critical rule. So an optimal method for selecting the parameters can be obtained. Exce...
Keywords/Search Tags:Support Vector Machines (SVM), Intrusion Detection Systems (IDS), algorithm improvement, pre-extract support vectors, step training method, classification rule simplification, generalization performance estimation, Span, linearly program
PDF Full Text Request
Related items