Font Size: a A A

Research On Traffic Identification System Based On Classification Algorithm And Cluster Algorithm

Posted on:2011-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y T CuiFull Text:PDF
GTID:2178360308461155Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Internet has become an integral and important part of people's lives and economic activities, in order to monitor the network operation and maintenance whether safe, efficient, stable, must to do a careful analysis and research on the features of network traffic and the categories of network traffic. This is very import for understanding the network real-time operation status, network behavior, positioning network failures in time, while for the efficient designed network system, also has played a guiding role in re-configuring network performance facilities and providing for different network customers. All of these have to be established on the base of network traffic identification.Many traditional technologies are not suited to development of the situation, the traditional network traffic identification technology, especially the application layer traffic identification technology has faced enormous challenges, for the current network traffic and patterns is much more complex than the past. These new businesses traffic have following characteristics:a large number of Web-based applications have been developed and been widely used, the number of these applications will continue growing in the future. Many of these new businesses traffic use private application layer protocol, these private protocols are very complex and difficult to understand and communicate on the form and operation. These new applications use irregular port numbers, and many of the new businesses traffic use a temporary port number which is greater than 1024 as the default port. Many businesses traffic's default port number doesn't register in the IANA port list, and many developed businesses for particular region users won't register their port number in the IANA port list. Many P2P and streaming media applications use dynamic port numbers to communicate between nodes.In conclusion, due to the complexity of network traffic and patterns, proposing a new and efficient network traffic identification technology has become an international research hotspot in recent years, the related subjects have great and profound significance.This dissertation researched into the principles of machine learning, data mining technology, and feature selection algorithms, studied a variety of network traffic identification algorithms, invented two network traffic identification systems which base classification and cluster separately, and make analysis and comparison between the two systems. The main works of the dissertation are summarized as follows:1. Studied systematically the internal and external network traffic identification technology status.2. Described systematically a variety of network traffic identification technology and also did analysis and comparison; described systematically the principles of machine learning, data mining technology, and feature selection algorithms.3. Taking into account the accuracy of the relatively low port-based identification method, while the cost of the method based on payload is too large to promote the use of the feature traffic characters of applications connected to the network to identify traffic. In this paper, two kinds of traffic identification systems are put forward:one is based on classification algorithm which integrates advantages of port number and transport layer traffic feature recognition identification technology; the other is based on the clustering algorithm.4. Through traffic collection and traffic testing, positive from the right rate, evaluated the performances of the two systems based on true positive rate, building model time, testing time, concise of algorithm models description, CPU utilization and memory consumption.5. Through a comprehensive assessment between the two systems, compared the two systems based on the correct identification rate of algorithm, real-time, variability of port, as well as CPU utility and memory consumption. Analyzed the two systems, and pointed to their respective advantages, disadvantages and application scenarios.
Keywords/Search Tags:Traffic identification, Machine learning, Data mining, Classification algorithm, Cluster algorithm
PDF Full Text Request
Related items