Font Size: a A A

Research On Network Traffic Classification Based On Machine Learning

Posted on:2016-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:2348330503476730Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
This thesis is supported by the project called Research and Application of Multi-dimensional Sensing Technology over Smart Pipe for Power Information and Communications Network, whose research target is about service-oriented traffic identification and sensing. The main content of the thesis is the study of network traffic identification based on machine learning algorithm.This thesis first introduces four kinds of classification methods in the field of network traffic classification:classification method based on port mapping, classification method based on the effective load, classification method based on the behavior of the host and classification method based on machine learning. By comparing the features of four kinds of classification method, this thesis proves that the classification methods based on machine learning is the most suitable to use in the electric communication network environment. This thesis then focuses on three kinds of typical machine learning classification algorithm:C4.5 decision tree, Naive Bayes classifier and support vector machine. For each algorithm, this thesis, through the analysis of the experimental data, gives the most suitable feature selection method and obtains the best classification performance, and studies the effects of the g parameters and C parameters on the classification performance on support vector machine. Finally, this thesis proposes two new kinds of selection method, and applies in the three machine learning algorithms to compare them with the existing feature selection methods. The results show that the CIG feature proposed in this thesis has strong universality and excellent performance.The whole thesis can be divided into five parts and its main content is as follows:In the first chapter, research background and research purpose is introduced, together with the architecture of this thesis.In the second chapter, four kinds of current classification methods are summarized. Through the analysis of the features of the classification methods and requirements of electric power communication network, a conclusion is drawn:classification method based on machine learning is the most suitable for application in the environment of electric power communication network. Then the feature selection and its application in classification method are introduced.In the third chapter, three kinds of commonly used machine learning algorithms:C4.5 decision tree and naive Bayes classifier and support vector machine are the focus of research. Through experiments, the performances of the classification of each algorithm on the Moore data set are analyzed, and the feature selection method and subset size suitable for each algorithm are given. Then the influence of parameter selection on classification performance of support vector machine is discussed.In the fourth chapter, two new feature selection methods:CIG and CGR are proposed. Then the performances of the two selection methods applied in classification algorithm are analyzed.In the last chapter, research work of this thesis is concluded and future direction is also pointed out.
Keywords/Search Tags:Machine Learning, C4.5 Decision Tree, Naive Bayes, Support Vector Machine, Feature Selection
PDF Full Text Request
Related items