Font Size: a A A

Research Of The Algorithm For Intrusion Detection Based On Semi-supervised Clustering

Posted on:2008-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:W R LiFull Text:PDF
GTID:2178360215972491Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Information system needs active protection measures. During these two decades, intrusion detection which protects system actively from hacker's attacks is a new technique. The traditional algorithms for intrusion detection based on supervised learning can't detect unknown attacks and request that data are correctly labeled as normal or anomaly, which detection rates are higher and false positive rates are lower. There are lots of data in network environment, especially for labeling new unknown attacks correctly is hardly possible. If the methods of unsupervised learning are applied to intrusion detection, the intrusion detection algorithms based on clustering can detect unknown attacks, which detection rates are higher whereas false negatives rates are also higher. Consequently, the paper proposes the algorithm for intrusion detection based on semi-supervised clustering.Semi-supervised learning is one of new research of many hot topics, which attains joint probability distribution of labeled data and unlabeled data to improve classifier's performance. The paper proposes the algorithm for intrusion detection based on semi-supervised clustering which uses a few limited labeled data to generate seed clusters initiating the algorithm and then aids clustering process to detect known and unknown attacks. There are a few labeled data in network environment. In order to maximize the utility of the limited supervised data available in a semi-supervised setting, constrains of labeled data should be selected as maximally informative ones actively rather than chosen at random, if possible. In that case, fewer constraints will be required to improve the clustering accuracy significantly.Systematically, the paper investigates the basic theory of intrusion detection system, introduces the definition of intrusion detection, and analyses the models of intrusion detection and research state-of-art and existing problems nowadays. Aiming at the problems of intrusion detection algorithm based on clustering, the paper proposes the algorithm for intrusion detection based on semi-supervised clustering, namely ACKID algorithm. The paper applies active learning strategy to semi-clustering process. Active learning queries constrains on labeled data and unlabeled data, which uses FarthestFirst to label the unlabeled data.KDD Cup99 datasets are standard datasets used to evaluate the algorithms for intrusion detection. The paper uses KDD Cup99 datasets to analyze the evaluation process of ACKID algorithm, confirming ROC curve as evaluation standard of ACKID algorithm, analyzing the attribute features of network data, preprocessing data and analyzing results.The experimental results demonstrate that ACKID algorithm which has the capability of generalizing unknown intrusion can detect unknown attacks, approve that ACKID algorithm using labeled data and constrains can improve the detection rates and low the false positive rates of the algorithm, and confirm that ACKID algorithm adopting active learning can acquire the most useful supervised information to detect unknown attacks.
Keywords/Search Tags:intrusion detection, semi-supervised clustering, active learning, ROC curve
PDF Full Text Request
Related items