Font Size: a A A

Research On The Applications Of Data Mining Technique In Intrusion Detection System

Posted on:2009-07-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J LiFull Text:PDF
GTID:1118360278966427Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a technique to mine the useful knowledge from the existing data set. In recent ten years, related research results have testified that it is very important to apply data mining technique to intrusion detection system (IDS) for effectively selecting features, properly building detection model as well as improving detection efficiency and decreasing both the false positive rate and the negative rate.When applying data mining technique to IDS, there are many algorithms to choose, but no algorithm can adapt to all circumstances, so there are still no authoritative results in algorithm research. Meanwhile, many researches lay particular stress on theory and technical aspects, and neglect the influence of algorithm complexity on the detection efficiency. In addition, most of the mature IDS products adopt the detection method based on rules to make the exact match between the packet and rule. If rules are too common or special, there will be many false or missing reports. That will reduce the accuracy of intrusion detection.Therefore, based on the research project"Research on the Intrusion Detection Technology Based on Data Mining"(02SJD520002) sponsored by the Education Bureau of Jiangsu Province, this dissertation makes researches on the algorithms adapting to IDS such as feature selection algorithm, numerosity reduction algorithm, and clustering algorithm with the targets of meeting the characteristics of the data source in IDS, reducing the complexity of algorithm and improving the efficiency. It also makes researches on the intrusion detection method based on data mining techniques with the target of enhancing the flexibility and reducing both the false positive rate and the negative rate.Based on the characteristics of the detected data in IDS, a Multi-time Fuzzy Iterating Feature Selection Algorithm adapting to IDS and a Correlation Measure-Based Feature Selection Algorithm for IDS are proposed in this dissertation. Multi-time Fuzzy Iterating Feature Selection Algorithm includes three steps, one is searching feature subsets from feature space, the other is valuating every candidate feature subset, and the last is classification. Corresponding search algorithm and valuation function are designed in this algorithm. The algorithm eliminates redundant features through multi-time iterating to get high precision feature value set, and uses fuzzy logic to get the value range meeting the need of precision. This algorithm can analyze data more objectively than the algorithms with field knowledge because it only operates on datasets. Simulation experiment and analysis are performed on the algorithm based on the KDD Cup 99 data set, and the experiment results are compared with feature visualization results. The results indicate: this algorithm can get good feature selection effect on IDS datasets. The Correlation Measure-Based Feature Selection Algorithm carries fuzzy process to feature value, calculates the degree of feature correlation, arranges features with descending order of the degree, then carry on feature selection based on the obtained feature sequence. The validity of this designed algorithm has been verified by doing experiments on the assessment system based on classifier and the dataset from the KDD Cup 99.In order to improve the mining efficiency of data classification in IDS, this dissertation also proposes a Numerosity Reduction Algorithm Adapting to the Data Classification in IDS, which uses range of values to reduce the amount of feature values and expands an isolated point to a region in order to forecast similar behavior. The results of experiments with decision tree algorithms and the KDD Cup 99 dataset have shown that this algorithm can reduce the time complexity and increase the classifying accuracy of the existing classification algorithms.Clustering is widely used in intrusion detection phase. In this dissertation, a hierarchical fuzzy clustering (HFC) algorithm is put forward to overcome the limitation of classical fuzzy C-means (FCM) algorithm. HFC can fast discover the high concentrated data areas by the agglomerative hierarchical clustering method, analyze and merge the data areas, and then use the evaluation function to find the optimum clustering scheme. The experimental results indicate that HFC has higher clustering precision and higher ability of excluding noises. The applicability of HFC algorithm to IDS is analyzed by doing experiments on KDD99 dataset.In order to improve the detection ability of rule-based IDS, this dissertation puts forward an Intrusion Detection Method based on CBR (Case-Based Reasoning). The steps of implementing CBR are described, several illuminative methods for designing and constructing case base from rules are proposed, and a CBR engine as well as the case matching algorithms is designed. Finally, the experiment based on Snort rule sets, the attack platform, the offline detection system, and the experiment based on online packets are performed which verify the effect of CBR for enhancing the detection ability of rule-based IDS.Finally, the works are summarized, the shortages are analyzed, and the targets of further research are given.This dissertation has done useful researches on the applications of Data Mining Technique in Intrusion Detection System.
Keywords/Search Tags:Data Mining, Intrusion Detection System, Feature Selection, Hierarchical Clustering, CBR
PDF Full Text Request
Related items