Font Size: a A A

Research On Nominal Data Clustering/Classification Algorithms With Their Applications In Anomaly Detection

Posted on:2010-06-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:1118360278474876Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and wide application of network technologies, computer and network security is becoming more and more important, and even has been the key to the national economic development and also an important part of the national defense security. Various network security events get frequently. It is an urgent task to detect, prevent and predict security events, protect the information system and the whole network infrastructure. Although network anomaly detection techniques about small-scale networks(such as local area network, LAN) have been studied for many years and some intrusion detection systems (IDS) have also been produced, due to the more wider bandwidth of network,more larger scale of network and more complicateded action of networks, many new characteristics emerge from the network field. How to mine the rules in network data effectively to improve the efficiency of monitoring network security events becomes the most significant issue in network security, and it is the key to improve the credibility of IDS and even the anomaly detection for the large-scale networks. Based on this background, we focus on studying the network data/the KDD Cup 1999 dataset, discover some new rules and produce some new viewpoints. Around the new recognition, we propose a series of research and quite a few new algorithms/methods in anomaly detection. The innovation and main research work can be summarized as the following:1. By analyzing and studying the network data/KDD Cup 1999 dataset, several observations are found. Firstly, data often do appear in homogeneous groups; Secondly, there exists a lot of nominal data in network data; Thirdly, the network connection records are heterogeneous; Fourthly, the nominal data are often unbalancedly distributed; Finally, anomaly intrusion data often do appear far from the normal network connections, essentially, they are outliers, etc. Based on the above, the dissimilarity measure of nominal data and heterogeneous data are studied, the structural information of data instances and dimension attributes has been mined from the dataset to construct the clustering clues. Several new algorithms, for example, nominal quantum clustering (NQC) algorithm,clustering with outliers (CO) algorithm,structure-based entropy clustering(SEC) algorithm etc are presented. Clustering-based unsupervised anomaly detection methods with new ideal are also given. Experimental results comparing with other methods demonstrate that the proposed method has promising performance.2. Clustering aims to study the instance distribution in scale-space. Its characteristics are very similar to the particle world in quantum mechanism. In quantum mechanics, the probability wave function describes the distribution of particle, and the Schr?dinger Equation is the major methodology of solving for wave function when restriced boundary condition is given. Once wave function is confirmed, and the quantum potential serves as the clustering objective function, which determines the location of particle distribution. By analysing the physical essence of the quantum clustering QC algorithm, the scale-parameterδappearing in the wave function of algorithm QC can be essentially revealed to be the corresponding kernel width, which can be estimated to improve the efficiency of the Quantum Clustering QC algorithm. Furtherly, in machine learning, the quantum mechanism implies that we can discover the grouping structures inherent in data and any imbalance of the distribution weight exist in the real particle world. The former is the core of Quantum Clustering, and is the same as the mechanism used in FCM algorithm. Accordingly, a cryptical wave function is found existing in FCM, ie, a quantum theory interpretation about FCM is produced; The latter is the same as the distribution of nominal instances, because the distribution of nominal data is imbalance each other. According to the similarity between the clustering and the quantum mechanism, a fuzzy clustering algorithm for nominal data using quantum mechanism is proposed and implemented on anomaly detection. The experimental results demonstrate it's excellent implementation in nominal data and better efficiency than the other algorithms.3. As is known to all, the support vector machine (SVM) only deals with the continuous data, and inner product of nominal data is a difficult problem to it. By studying the essence of the kernel method and the kernel function in SVM, a new kernel-based inner product method of nominal data is given, and implemented to classify the nominal data and heterogeneous dataset KDD Cup 1999. Experiments demonstrate that the SVM is extended in nominal data efficiently.
Keywords/Search Tags:Clustering algorithm, Nominal data, Heterogeneous data, Kernel-based classification method of nominal data, Anomaly detection, Dissimilarity measure, Quantum theory, Outliers clustering, Structure-based entropy, Clustering clue
PDF Full Text Request
Related items