| In recent years, with the rapid expansion of Internet users and the diversification of service types, the internet business is explosively increasing and the characteristics of network behavior are becoming gradually complex. It has brought enormous challenge to the network planning and network management. Therefore, the network traffic analysis becomes popular in the network research. However, network traffic identification is the basis of the traffic analysis. Network traffic identification technology mainly contains two technologies which are respectively based on the port and the application layer protocol label. These techniques have the advantages of higher accuracy and simple achievement. But they hardly overcome the shortcomings that they must know the port and the data of protocol label and can not identify the cryptographic traffic. The algorithms based on machine learning can overcome these shortcomings.Network traffic identification technology is the basis of traffic measurement and analysis. How to identify, classify and control network traffic of the different applications by means of accurate traffic characteristics, and efficient classification algorithm, and reasonable solution is an urgent important subject that needs to be studied and solved. This research is one part of "Research of key technologies of intelligent optical access network" project (60672025) which pertains to the subsidization of the national natural sciences fund. This paper mainly gives a further and comprehensive research about the character research of traffic identification algorithms in high-speed access network. Proposed traffic identification algorithms which based on machine learning for high-speed access network and on the other two kinds of traffic identification methods which were used in the high-speed network were compared with research and analysis. We wish it should be of importance to the development and research on traffic identification of high-speed access network. Combining theory study and simulation, the writer has done the following work:1. Studied the internal and abroad development instances of traffic identification in detail, and deeply studied the international trends of traffic identification which based on machine learning. Analyzed the advantage of the traffic identification which based on machine learning.2. This dissertation researched into the characters of traffic identification, discussed the methods of network flow measurement and index system;3. Analyzed the necessity of network traffic recognition, Then compared the difference of network flow recognition technologies, expounded the advantages and disadvantages of them and the development trend of flow recognition technology;4. Proposed and implemented network traffic identification analysis system which is based on network identification and the analysis of algorithms in the last part, and the use of 11 kinds of supervised machine learning algorithm to evaluate the system, and used many indicators to test the Broadband network:For example, the correct rate of affirmation, CPU, modeling of time, the testing time and so on;5. Through the experimental analysis, we come to the conclusion that C4.5 and RandomTree algorithm based on decision tree induction, OneR algorithm based on Rule-Based Reasoning and BayesNet algorithm based on Bayes classification are suitable to be used to make traffic identification in broadband network.This article is summary of the writer's theory study and practical research during being a graduate student, including five parts:The 1st part analyzed and described the current situation of traditional Internet, described the technical development and necessity of network traffic work, expounded the background, significance, the goal and the key content of this paper; The 2nd part researched into the characters of traffic identification, including self-similarity and long, heavy tail, and these features on the network, discussed the methods of network flow measurement and index system, prepared subsequent work and direction for papers;The 3rd part analyzed the necessity of network traffic recognition, Then compared the difference of network flow recognition technologies, expounded the advantages and disadvantages of them and the development trend of flow recognition technology;The 4th part proposed and implemented network traffic identification analysis system which is based on network identification and the analysis of algorithms in the last part, and the use of 11 kinds of supervised machine learning algorithm to evaluate the system, and used many indicators to test the Broadband network:For example, the correct rate of affirmation, CPU, modeling of time, the testing time and so on;The 5th part through the experimental analysis, we come to the conclusion that C4.5 and RandomTree algorithm based on decision tree induction, OneR algorithm based on Rule-Based Reasoning and BayesNet algorithm based on Bayes classification are suitable to be used to make traffic identification in broadband network.The 6th part summarized the main achievements and innovation of the paper, pointed out the problem is solved in the paper and prospected the next research work. |