Font Size: a A A

Research And Applications On Intrusion Detection Based On Support Vector Machines For Imbalanced Datasets

Posted on:2012-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:M W ChenFull Text:PDF
GTID:2178330335962923Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In applications of machine learning, pattern recognition and intelligent information processing, we should often handle unbalanced datasets classification problems, in which, specifically speaking, the amount of samples of one certain class is far less than that of other classes, such as in outlier data analysis, intrusion detection, fraud detection, video surveillance, fault diagnosis and medical diagnosis. However, using traditional classifiers in unbalanced datasets classification cases, the classification performance for minority samples usually radically degenerate, for that, decision results of traditional classifiers are always biased to majority samples. But, in practical applications such as intrusion deteetion, classification accuracy for minority samples is more significant. Therefore, one of the most important problems to be addressed in unbalanced datasets classification is to avoid allocating larger decision spaces for majority samples. Researchers in machine learning have been doing a great deal of work for unbalanced datasets classification and several techniques have been proposed so far, which can be grouped into two styles:one is to preprocess datasets in which it weakens the imbalance of original datasets using altering the distribution of training samples; the other manner is to modify learning algorithms to adapt unbalanced datasets.Support Vector Machines is a classifier based on statistical learning theory and Structural Risk Minimization, which is one of the most commonly used methods in academia and industry attributed to its strong generalization. However, it has been shown that the performance of standard SVM degrades when works on unbalanced datasets, with which some researchers have proposed some algorithms to deal, but these methods usually have high computation complexity, cannot obtain global optimum and also depend on the property of original datasets.This dissertation attempts to address the problems mentioned above so that focuses on Support Vector Machines for Imbalanced Datasets based on Intrusion Detection as an application. And the main contents and contributions of this dissertation are as follows.(1)First of all, fundamentals of machine learning and backgrounds of imbalanced datasets classification are introduced. And we also review the the state-of-the-art of imbalanced datasets classification. Then we introduce the principle of network intrusion detection and describe the datesets used in this dissertation in detail.(2)We introduce the fundamental concepts of statistical learning theory, especially, the principle of Structural Risk Minimization and the implementations of SVM. Then use standard SVM on datasets for intrusion detection and analyze the experiment results.(3)The commonly used improved methods based on data-level of imbaianced datasets classification problems are introduced. Then apply one of the most popular oversampling algorithms called SMOTE, and a new proposed method in this paper, namely. Cluster-based Undersampling, respectively to oversample and undersample the original datasets. An empirical study demonstrates that on the resampled datasets respectively using SMOTE and Cluster-based Undersampling, standard SVM has comparable classification performance, however, better than that on the original datasets.(4)Researchers have proposed some algorithms to improve standard SVM to handle imbalanced datasets classification problems, but these methods usually have high computation complexity, cannot obtain global optimum and also depend on the property of original datasets. In order to address these problems, a new method called weighted-SVM, is proposed, based on reduction in contributions of majority samples. Meanwhile, it was applied into intrusion detection. Theroetical analysis and empirical results demonstrates the proposed algorithm has better performance, compared with standard SVM, moreover, without increasing the computation complexity.
Keywords/Search Tags:Imbalanced datasets, support vector machines, machine learning, intrusion detection, classification learning, statistical learning theory
PDF Full Text Request
Related items