Font Size: a A A

Machine Learning In Network Traffic Classification

Posted on:2020-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:F Y WangFull Text:PDF
GTID:2428330596475533Subject:Engineering
Abstract/Summary:PDF Full Text Request
As the size of the Internet grows larger and the number of Internet users increases,the network infrastructure becomes more and more complex.Internet providers and administrators are increasingly eager to accurately and quickly classify network flows to ensure high availability.Due to the different characteristics of the application layer protocol and the limitations of the marking means,the network flow is a category number imbalance data set.This paper compares the characterization capabilities of common learning model evaluation indicators in imbalance dataset,and introduces the concepts of macro and micro averaging in the field of text categorization.Then an integrated learning algorithm based on C4.5 decision tree is proposed.The integrated learning algorithm utilizes the C4.5 decision tree to be insensitive to unbalanced data sets,and uses a gradient boost to combine a sequence of weak C4.5 decision tree models,avoiding the preference for most classes.Then bagging of the feature and the penalty factor of the model complexity L2 regularization.The differences between these weak C4.5 decision tree models increase the ability to classify a few classes,comparison with common machine learning models shows that the integrated learning algorithm has greatly improved the prediction ability of a few classes on the Moore dataset.This paper illustrates the inevitability of the imbalance of the number of network flow categories,and on this basis,proposes a complete network flow classification framework.Data preprocessing,feature selection,random resampling,and cost learning are performed on the network stream data.Then enter the classifier for training and compare it with the existing research results.The results show that the framework greatly improves the forecasting ability of a few classes while ensuring that the predictive power of most classes is not significantly reduced.Aiming at the feature selection stage,this paper proposes a search algorithm,and then proves that the algorithm is better than the common Best-FirstSearch algorithm.And the analysis of the impact of common resampling methods on the prediction results on the Moore dataset.
Keywords/Search Tags:Network flow classification, imbalanced dataset, statistical feature extraction, machine learning, integrated algorithm
PDF Full Text Request
Related items