Font Size: a A A

Adaptive Fuzzy Clustering Algorithm And Its Application In Intrusion Detection

Posted on:2018-07-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:M RenFull Text:PDF
GTID:1318330518470157Subject:Network and network resource management
Abstract/Summary:PDF Full Text Request
Unsupervised clustering algorithm is often used to analyze the samples of intrusion detection system without class labels,and then the detected data are judged as the normal or abnormal behavior according to their characteristics.However,with the constantly emerging of the new attack types and the explosive increase of information,the traditional clustering algorithms cannot meet the needs of intrusion detection system,needing to be further enriched and improved from the theory and method.Fuzzy C-means algorithm(FCM)is the process of an optimal solution to a nonlinear programming problem with constraints.The algorithm is widely applied in the intrusion detection field for its merits of the simple implementation,the easy expansion and the fast convergence.But FCM still has some drawbacks,embodied in the following aspects.(1)FCM is very appropriate for the data in low-dimensional space,but not effective for the high-dimensional data affected by ‘dimension effect'.In fact,in dozens of features of data,there are a lot of irrelevant or redundant ones,which provide nothing or only little information for clustering.So it is the necessary premise of clustering analysis to select the optimal feature subset.(2)FCM needs to preset parameters,such as the number of clusters and the fuzzy weighting exponent,which are closely associated with the characteristic of the dataset,directly impacting on the final clustering result.If these parameters are manually set by means of the experiment or depending on the expert experience,it will make the clustering result one-sided and subjective to some extent,probably degrading the performance of the algorithm.(3)FCM always assumes that the contributions to clustering of all features are balanced,but it is not the case in the actual application.Moreover,if the contribution degree of the features is assigned unreasonably,it will lead to the clustering result tremendously deviating from the correct one.(4)FCM is adapted to the continuous data,while intrusion detection dataset has both continuous and discrete features.If it is put into use directly without any improvement,the clustering accuracy will reduce.These deficiencies greatly limit the application of FCM in the intrusion detection field.Therefore,this thesis chose FCM to explore and improve.The main contents are as follows.(1)Study the feature selection technology,solving the clustering problem in high-dimensional space.The optimal feature subset can cut down the computational time of the clustering algorithm and effectively improve the intelligibility and the accuracy of the clustering result.Therefore,this thesis put forward a feature selection algorithm based on neighborhood rough set and genetic algorithm.Neighborhood rough set model expands the equivalence relation of discrete space tothat of continuous space,but the parameter neighborhood is generally set artificially.So at first,the concept class average distance of decision attributes was proposed to automatically calculate the neighborhood according to the characteristic of the dataset.Secondly,the attribute significance of neighborhood rough set was improved,for it only considering the impact on decision of a single attribute while ignoring that of the dependency between the attribute and others.Then,it was used to construct the fitness function of genetic algorithm,aiming at the larger average attribute significance of decision attributes and the possibly less number of attributes.In addition,crossover rate and mutation rate were calculated by use of the selected frequency of the feature and the improved attribute significance,respectively,and finally genetic algorithm was used to select the optimal feature subset.In order to verify the feasibility of this algorithm,experiments were done on KDD CUP 99 dataset,and the results showed that the feature subset selected by the proposed algorithm in this thesis ensured FCM getting higher accuracy.(2)Study how to determine the number of clusters self-adaptively,solving the problem that the parameter is difficultly set in advance.Cluster validity problem can automatically determine the optimal number of clusters and the key parts are the upper searching bound and the cluster validity index.Therefore,according to the characteristic that the clustering center has the high local density and is relatively far form each other,this thesis first put forward an initial clustering center selection algorithm based on local density so as to find out better initial clustering centers,overcoming the defect of FCM being sensitive to them to a certain extent,and at the same time obtain the upper searching bound of the number of clusters that is the number of the centers,avoiding the disadvantage of the empirical rule.Then,based on fuzzy compactness in clusters and separation between clusters,a new fuzzy cluster validity index was proposed,with a penalty function restraining the value of the index monotonically decreasing and reaching 0 when nc ?.On this basis,a self-adaptive FCM to determine the optimal number of clusters was designed.The experimental results showed that the algorithm not only automatically obtained the optimal number,but also effectively speeded up the convergence of FCM and reduced the iteration of running FCM.(3)Study how to self-adaptively seek out the optimal fuzzy weighting exponent,solving the problem that the parameter needs to be assigned manually.Fuzzy weighting exponent is an important parameter of FCM,closely related to the performance of the algorithm.Firstly,an improved fuzzy correlation degree was put forward to measure the relevance between the clusters,based on which a new cluster validity function was defined to evaluate the quality of the fuzzy partition.Then a self-adaptive FCM for the optimal value of m was proposed,with the aid of the global search ability of the improved particle swarm algorithm to find out both the final clustering centers and the optimal fuzzy weighting exponent automatically.The improved particle swarm algorithm updated the speed and the position based on the dynamic inertia weight and the learning factors,and introduced the mutation of genetic algorithm to keep the diversity of the particles,preventing the premature convergence.The experimental results showed that the improved FCM automatically calculatedthe optimal value of m and meanwhile achieved the better clustering results.(4)Study the self-adaptively feature weighting method in mixed-feature dataset,solving the problem of the imbalance contribution of features.Rough set and fuzzy set are complementary in dealing with uncertain data,so this thesis combined the rough set model and FCM.In the algorithm,the value of the threshold ? of rough set is critical,usually setted manually.Thus shadow set theory was used to automatically obtain the optimal value of ? to guide the clustering by minimizing the objective function.Based on rough set and shadow set theory,fuzzy distribution centroid was defined to present the clustering center of the discrete feature,so that FCM was extended to cluster the data with both continuous and discrete features.And then considering the different contribution of the features to each cluster,a new weighted objective function was constructed in accordance to the principle of fuzzy compactness and separation.Because learning feature weight is the key step in feature weighted FCM,this thesis regarded feature weight as a variable optimized in the clustering process,putting forward a self-adaptive mixted-feature weighted FCM.The experimental results showed that the algorithm effectively applied to the heterogeneous mixed-feature dataset.
Keywords/Search Tags:Fuzzy C-means Algorithm, Feature Selection, Cluster Validity, Fuzzy Weighting Exponent, Feature Weighting, Intrusion Detection
PDF Full Text Request
Related items