Font Size: a A A

Study On Adaptive Anomaly Detection Based On Data Mining

Posted on:2010-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:F RenFull Text:PDF
GTID:1118360272497306Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Data mining can mine specified patterns that people are interested in from large datasets. Therefore, data mining technique is applied for intrusion detection in large number of research projects, which greatly promote the development of intrusion detection. However, there are still many problems in the field of data mining-based intrusion detection, as following: poor adaptability,inability to detect novel attacks; high ID (Intrusion Detection) modeling cost,slow updating speed; Lacking of extensibility, lack of the ability to adapt the ID model derived from certain computer system to another system.In order to promote the development of data mining and intrusion detection techniques, aiming at the essence of problems in data mining-based intrusion detection, this paper provides new methods and effective approaches for intrusion detection in theory and in application as following aspects:1. The classification of IDS is dissertated. Meanwhile,the system structure of IDS and related detection technologies are discussed in detail. Also, a survey about ID modeling technology is given and the primary problems of ID modeling are discussed.2. A novel ID model-AADDM is put forward. The design process, model structure and means of collecting and retreating network connection records are also given. AADDM filters the noise /attack data in source dataset and generates a pure training dataset by a top-down density-based clustering method; builds a lightweight and efficient intrusion detection system by GA-SVM based useful feature selection algorithm; makes use of unsupervised self-learning mechanism-incremental density-based clustering,partitions network behavior set into normal behavior set, abnormal behavior set and generate intrusion detection profiles. The intrusion patterns are extracted automatically from real time security affairs data,so the intrusion patterns database can be updated automatically according to the current condition. Besides,training datasets and background knowledge are not needed,so AADDM has the advantage of less cost. AADDM provides a novel idea for ID research.3. In this paper, we have proposed a training data set generation algorithm which uses a novel top-down clustering method based on region density using a multidimensional index. Generally, multidimensional indexes have inherent clustering property of storing similar objects in the same or adjacent data pages. By taking advantage of this property, our method finds similar objects using only the region density information without incurring the high cost of accessing the objects themselves and calculating distances among them. First, we have provided a formal definition of the cluster based on the concept of region contrast partition. Next, we have proposed the density_ pruning_clustering algorithm(DP). DP employs a branch-and-bound mechanism that improves efficiency by pruning unnecessary search in finding the set of dense regions. To evaluate the performance of the proposed algorithm, we have conducted extensive experiments. Experimental results show that the accuracy of the proposed algorithm is similar or superior to that of BIRCH except for exactly spherical clusters. The results also show that the efficiency of the proposed algorithm is far superior to that of BIRCH due to density-based pruning. Experimental results for large data sets consisting of 10 million objects show that density_pruning_clustering algorithm reduces the elapsed time by up to 96 times compared with that of BIRCH. Even with the cost of index creation and maintenance considered, the proposed algorithm is significantly (by an order of magnitude) more efficient than BIRCH. Further, we note that the improvement in performance becomes more marked as the size of the database increases, making this method more suitable for larger databases. The top-down clustering approach proposed in this paper greatly improves the clustering performance for large databases without sacrificing accuracy. We believe that the proposed methods will be practically usable in application as intrusion detection training dataset generation.4. Feature selection is one of the main methods for data preprocessing, which can be used for alleviating the effect of the curse of dimensionality, enhancing generalization capability and improving model interpretablity. This paper proposes a new feature selection algorithm, called GA-SVM, aiming at building intrusion detection system by (1) using a hybrid strategy of genetic algorithm and heuristical seareching algorithm as the search strategy to specify a feature subset for evaluation ; (2) using one class Support Vector Machines to evalueate the quality of the searching results. We seperated KDD1999 intrusion detection dataset into several testing groups. The experimental results show that the approach is able not only to speed up the process of detection but also make a better detection quality.5. In this paper we present an adaptive anomaly detection algorithm using density-based incremental clustering called ADDBIC. It applies a new statistical method to summarize the normality profiles of the clusters generated by the algorithm automatically. Each normality profile is corresponding to a cluster and composed of two different summaries: internal and external. The internal summary contains the properties of the cluster while the external summary represents the statistics of noise values around the cluster. All normality profiles are collected and used to monitor the target system as a detection model. Updating algorithms of insertion and deletion are explored to adjust existing clusters and normality profiles in a real-time manner. Due to the density-based nature, updating operations affects existing clusters only in a small range neighborhood of the inserted or deleted training instances. The major contributions of this paper lie in twofold. Firstly, initial clusters on training data set are generated by density-based clustering and adjusted in a small range in a real-time manner. By comparing feature values of training data set, we discover that normal values always concentrate on a small numerical range while abnormal values spread around the normal values. So we can distinguish normal and abnormal values by their density relationship. When updating detection model by insertion or deletion operations, feature values will be inserted in or deleted from existing clusters. It can be shown that our insertion or deletion operations will not greatly change the density relationship of normal values in existing clusters. So we can update the detection models just by some adjustments in a small range of existing clusters instead of retraining on the whole database. Time cost of updating is greatly saved and the updating can be done in real time. The second contribution of our paper is that we use the statistical method to describe the detection model generation and attack detection. Once a cluster generated or modified, normality profiles of feature values involved in clustering will be calculated and compared with the online connection records by our statistical method. For containing only statistical summaries of existing clustering results, our normality profiles could be updated and compared much efficiently. ADDBIC shows a better performance on real-time anomaly detection, when compared to other existing adaptive detection algorithms such as ADWICE. The comparison experiments have shown that ADDBIC demonstrates a better performance on the given data set than ADWICE in terms of both false alarm rate reduction and profile updating these important factors for anomaly detection systems.6. In the paper, a kind of ontology description method of IDS based on the incremental density based clustering is provided. The method captures the detailed description of attack attributes, extracts the concepts and relationships between the concepts, and depicts the knowledge of the IDS domain using the OWL. We introduce the instantiation description of Mitnick and Buffer Overlfow by ontology description method. During the intrusion detection, the instance information, the relationships between the attacks and the whole procedure of the attacks can be detailed described and generated by the method. Provided the sharing of domain knowledge, ontology-based intrusion detection system possesses the ability of reasoning upon the instance information of attacks and the description of attacks provides the same agreement on the knowledge in heterogeneous intrusion detection systems.In conclusion, this dissertation has academic significance and value of application, and it enriches the research of intrusion detection. It also provides constructive method and techniques for research of intrusion detection.
Keywords/Search Tags:Intrusion detection, data mining, density-based clustering, adaptive anomaly detection, ontology discription
PDF Full Text Request
Related items