Font Size: a A A

Uncertain Intrusion Detection Framework In Data Stream

Posted on:2008-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:F Y YuanFull Text:PDF
GTID:2178360212495825Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data stream is a kind of data using in such applications as decision support, network traffic analysis, Web click stream, energy consumption measurement, sensor network data analysis and stock trend analysis, etc. Data stream is usually real-time, quick, continuous and orderly, and proposes the new study for the application of data mining techniques. The conventional data mining algorithms analyze the research objects accurately, while of data stream mining algorithms must be quick and consume little memory for the real-time feature. So the data stream algorithms deal with the data only once and get the synoptic information. Then the algorithms discard the scanned data if they are not very important, and provide the result as need. Because of the features of data stream and limitation for data mining techniques, the data stream mining algorithms can not get the accurate result and the result is always the approximate value. Currently the data stream mining becomes the focused field of knowledge discovery, several algorithms for clustering, classification, frequent pattern mining and regression have been proposed. However, these algorithms are still not mature. Some of them have the shortcoming of bad result, or others are not real-time as ordinary incremental data mining algorithms.This paper mainly researches the clustering and outlier detection in data stream and its purpose is to get the effective result quickly. To research in data stream, the paper takes the intrusion detection system as experiment. The intrusion detection system monitors the network traffic and hosts in network and makes the alarm to managers when it detects the connections which threaten the availability, integrity and privacy of normal network data. For the large amount, noise and real-time arrival, the network connections should be regarded as typical data stream. The paper reviews some clustering algorithms such as partition-based, hierarchy-based,density-based and model-based methods, and most of them can not suit data stream. Professor Deyi Li proposed a data mining method based data field, which execute clustering and outlier detection effectively in data set. In this paper, an algorithm of clustering based grid is proposed by the elicitation of the data field method. The algorithm uses grid to save the data field information and detects the outliers by the clusters. The proposed grid-based algorithm images the data object space to the multi-dimension grid space according to the value domain of properties. Under the environment of data stream, algorithm assigns the arrival data to grid space and maintains the clusters dynamically. Each cluster maintains the synoptic information of data stream in this process, which includes density, center, square sum, and the set of grid cells of cluster. These clusters are merged and divided dynamically by the synoptic information to maintain the better result. At the same time, the algorithm takes a score for each network connection by clusters, and this score represents the anomalous degree of network connection. The connection is anomalous if its score exceeds the threshold, otherwise it is normal. The paper lists the measurements for intrusion detection, and emphasizes two standard measurements: true positive rate and false alarm rate. Furthermore, because these measurements can not evaluate the attack with multi network connections, paper proposes the response time, which is extended measurement for the attack with multi network connections. In the paper, the score of connection is adapted dynamically by weight, which can reduce the false alarm rate and response time effectively.Because the data stream mining usually gets approximate result, the paper analyzes the uncertainty in data mining, and focuses on the feature extraction and mining algorithm. The method is based on uncertainty to get available result. Feature extraction is the most important step in the data preprocess. The features must be extracted to suit the input of algorithm ofclustering and outlier detection. In the paper, the backward cloud generator is used to discrete the numeric features, and the quantitative value is converted to qualitative value by generator. The generator includes three numeric characters: expectation, entropy, and super entropy and represents the distribution of data.The paper compares the existed intrusion detection system. The network connections have randomcity and uncertainty, so the mining algorithm should be strict in performance. The two methods, misuse detection and anomaly detection, both have advantages and limitation each other. For this reason, it is possible that data mining technique can use in application of intrusion detection. The framework in paper combines the misuse detection model and anomaly detection model. The framework can find the known type of attacks quickly and capture the unknown attacks by outlier detection. These two methods improve each other and make the framework more effective and precision. The expert system of intrusion detection can learn continuously when it works, which is interesting for other similar applications.The experiments are taken in DARPA intrusion detection dataset provided by MIT Lincoln Laboratory, and prove that algorithms in the paper have good analysis result.In short, the uncertain intrusion detection framework in data stream can mine the available result quickly and real-time.
Keywords/Search Tags:Uncertain
PDF Full Text Request
Related items