Font Size: a A A

Key Technologies Research On Time Series Data Mining For Large Scale Network Security Situation Analysis

Posted on:2011-10-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:W C ChengFull Text:PDF
GTID:1118330332987021Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the network security situation analysis, network administrators can understand the security situation of the large scale network, and get the assistance for decision making. Therefore, the related researches attract attention from the government and academia in recent years. In order to analyzing the security situation of the large scale network, plenty of data-adopting tools have been deployed in the backbone networks. Since the high performance is required by the large scale network, these tools are most designed in the dedicated way. Instead of association analysis commonly used in the normal scale networks, we may only extract the information of the data which are produced by these tools with statistical analysis, and the time series data formed by the statistics evolving over time can reflect the risk changes in the large scale network. Therefore, the large scale network security situation analysis deeply relies on the data mining over the network security time series data.Considering the requirements of the large scale network security situation analysis, we research on the mining over the network security time series data, and conduct the experiments on the Trojan data produced by"863-917"network security monitor platform and the botnet data produced by the honeynet. We extract four important problems and conduct an in-depth study in the aspects of finding the special changes and supporting the decision making in network security time series data mining. The main contents of this dissertation are organized as follows:1. Anomalous wave sections detection over pseudo period network security time series data. Pseudo period time series data appear in many large scale network security applications. The anomalous wave sections usually suggest the changes of the network security risk which are worth to do further analysis. Due to the instability of networks, we adopt dynamic time warping distance which has been suggested to be adaptable to data shift as similarity measurement of different wave sections in pseudo period data, and then detect the anomalous wave sections which have few historical similar counterparts based on that similarity measurement. A fast detection algorithm based on cluster index is proposed to speedup the detection process. Extensive experiments on the Trojan and botnet datasets show the efficiency of the proposed method is better than the algorithm which is directly based on DTW with the acceptable accuracy loss.2. Interval differential skyline query over network security time series data streams based on wavelet synopses. In the process of the large scale network security situation analysis, we need to select some special data which we can focus on from a large number of time series data. Based on the volume measurement, the current interval skyline query sometimes can not satisfy the network security applications requirements, and the"submerge"phenomenon may exist. So the concept of the interval differential skyline is proposed which focuses on the attribute of increasing rate of the data to fix the shortages of the former kind of interval skyline query. In the background of network security data streams processing, an efficient algorithm is proposed which implements the interval differential skyline query in different granularities based on the commonly used wavelet synopsis. Extensive experiments on multiple kinds of Trojan data in multiple areas show that the proposed method can fix the shortages of the existed research, and has high performance.3. Similar sub-sequences search over multi-dimensional network security time series data. Historical similar sub-sequences can be used for providing decision making support to network administrators, and they can also be used for predicting the future changes in a qualitive way. Due to we consider the recent data in a time window will be more interesting, to get more useful search results with extra valuable information in the time window, the similar sub-sequences search problem is extended to the multi-dimensional scenario in this dissertation by introducing data cube model. Moreover, by studying the correlation of the cells among the neighboring levels in the data cube, the efficiency of the search algorithm can be improved on the basis of keeping the accuracy of the search results. Extensive experiments on the multi-dimensional Trojan data demonstrate the proposed method can get more valuable search results and has high efficiency.4. Prediction for the network security time series data. Time series data prediction is a long-standing issue of great concern, which has important requirement in network security situation analysis. Since the network security time series data affected by plenty of issues have large random perturbation, it is hard to build a suitable prediction model. The accuracy of the classical prediction methods may be undesirable. In this dissertation, we adopt the idea of the CBR(Case Based Reasoning) and introduce the concepts and methods of frequent episodes in the domain of event sequence analysis, to provide a new idea to solve the problem of network security time series data prediction. Based on it, we propose two concrete algorithms with the mean value feature and the trend feature respectively to achieve the prediction tasks for different data types. Extensive experiments on the Trojan and botnet datasets demonstrate the high prediction accuracy of the proposed methods for the network security time series data.In summary, we focus on the time series data mining for the large scale network security situation analysis, and four key issues have been conducted based on it. These works have academic and practical value for advancing the theory and practicability of the above research.
Keywords/Search Tags:Time series, Time series data mining, Network security situation analysis, Anomaly detection, Interval skyline, Similar sub-sequence search, Prediction
PDF Full Text Request
Related items