Font Size: a A A

Research On The Key Technologies Of Data Stream Clustering Based Network Service Identification

Posted on:2014-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:D LiFull Text:PDF
GTID:1268330401463122Subject:Information security
Abstract/Summary:PDF Full Text Request
With the development of Internet, the number of network applications increase rapidly. It leads to the improvement of social efficiency and enrichment of people’s spiritual life, and also complicates network environment. Congestion occurs as network bandwidth resources are occupied by vast amounts of P2P traffic data, service quality reduces, and network security has become a serious problem. Hence there is an urgent need for implementation of network management and monitoring, which could optimize network resources, solve the security problems, improve network transmission capacity, and provide the scientific basis for the network expansion. Network service traffic identification technique is one of the effective methods to solve the problems mentioned above. However, traditional identification technologies rely excessively on traffic information of port number and packet payload, which has a negative influence on ability to deal with complex network traffic. Data mining-based identification technology extracts statistical information of network service traffic and classifies them by supervised or un-supervised method. It is more suitable for identifying complicated network traffic, and becomes one of the key research directions.Considering the data stream characteristics for network service flows, our researches concentrate on study data stream clustering algorithms and network service traffic identification scheme. The main contents and innovative points of this paper are as follows:Clustering for data streams with arbitrary shape based on adaptive time weight threshold of grid:grid technology is featured by high processing speed and the processing time which depends only on the size of grid. Given the arbitrary shape, tilt features of time and space for network data stream, the paper proposes a grid-based clustering algorithm for data streams with arbitrary shape. The algorithm introduces the concepts of potential dense grid and outlier grid based on fading function, and defines an adaptive time weight threshold of grid, which considers both tilt features of time and space for network service data stream. Online maintain function is designed to detect and delete ineligible grids periodically, which improves the storage and time efficiency. Experiments show that the algorithm can identify clusters with arbitrary shape and space tilt feature from noise data, and clustering network data stream with higher quality and speed.Evolution clustering for data streams based on grid-density:actually, users may not only want to know the characteristics of network data streams at the specific time, but also characteristics in specific time horizon or evolvements of network traffic between different periods. In this paper, a grid-density based clustering algorithm for evolving data streams is proposed. Density coefficient for data record is applied to deal with time tilt problem of network traffic. Pyramid time frame technology is introduced to save snapshot of grid set at the specific time. The algorithm has abilities of clustering at specific time, clustering in time horizon, and evolution analysis clustering. Experiments show that this algorithm has good robustness of noise, and perform better in data stream analysis and processing speed.Semi-supervised network service identification scheme based on data stream clustering algorithm:the application of single identification technology can not analyze network service traffic comprehensively because of the imbalance proportion and different properties of mice flow and elephant flow in network traffic. In this paper, we use different elephant thresholds to judge TCP flow and UDP flow, and propose a multi-level network traffic recognition system by combining various identification technologies. In this system, identification of mice flow is based on port, payload and data mining methods step by step, while identification of elephant flow is only based on data mining method. As to data mining based identification of network service traffic, traditional supervised method is limited by the training dataset which is used to the classifier learning, and is not suitable for real-time network traffic identification. Un-supervised method can find that nature clusters in traffic, but analysis for how to map clusters to each service application efficiently remains to be difficult to accomplish. Considering the features of network traffic sufficiently, this paper presents a semi-supervised network service traffic identification scheme based on data stream clustering algorithm. The scheme applies a two-phase framework, which implements single pass scan to process online real-time network traffic. It stores the micro-clusters set periodically to the offline time snapshots database. In response to user requests, offline component chooses clustering algorithm and related data from time snapshots database, and generates clusters. This paper maintains an offline mapping rules database, which is obtained through identifying sampled real-time traffic flows based on port number or payload identification techniques, and mapping the related micro-cluster to application type. In addition, the paper also using different elephant thresholds to get sub-flow from TCP/UDP elephant flow. Features of sub-flow are extracted, and the best feature subset is chosen by feature selection algorithm.
Keywords/Search Tags:network service identification, adaptive clustering, evolution analysis, multi-level shunt network traffic identification, semi-supervied method
PDF Full Text Request
Related items