Font Size: a A A

Research On Imbalanced Network Traffic Classification Algorithm Based On Supervised Learning

Posted on:2021-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:D LiuFull Text:PDF
GTID:2428330614458211Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,network traffic classification has become an important part of network management and network security.The network traffic classification method based on machine learning has attracted widespread attention of researchers because it has many characteristics,such as high classification accuracy and strong scalability.However,problems include uneven distribution of traffic samples and untimely update of classification models still stand out.This paper analyzes and researches network traffic classification methods based on supervised learning.The main research contents are as follows:1.Due to the uneven distribution of network traffic samples,the performance of the classifier is overwhelmed by the majority class and the classification accuracy of the minority class is ignored.Aiming at the problem,a new network traffic feature selection algorithm based on correlation filtering is proposed.First,according to the category distribution information,a feature metric named weighted symmetric uncertainty is defined which is biased toward the minority class.And,the weighted symmetric uncertainty between features and categories is calculated.By comparing with the threshold,unrelated features are deleted.Then,based on the weighted symmetric uncertainty,the markov blanket condition is redefined,and the approximate markov blanket is obtained.Redundant features are removed using the approximate markov blanket.Finally,the feature metric and sequence search algorithm based on correlation are used to further reduce the feature dimension to obtain the optimal feature subset.Experimental results show that the proposed feature selection algorithm can effectively deal with the problem of imbalanced class distribution,and can effectively improve the recognition rate of the minority class without sacrificing the overall accuracy of the classifier.2.In the process of network traffic classification,it is difficult to update the classification model frequently and timely,and the classification accuracy of the model gradually decreases with time.Aiming at the problem,an imbalanced network traffic classification model based on ensemble learning is proposed.First,a base classifier is obtained based on the existing data set.In the data preprocessing process,the algorithm proposed in research content 1 is used for feature selection.Then,the idea of incremental learning is introduced to detect the early concept drift of the system.If the concept drift occurs and reaches a certain level,the arriving network flow and its classification result obtained through the base classifier are used as a new data set,and a new base classifier is trained based on the new data set.Finally,the new base classifier is added to the ensemble classification system to participate in the next stage of network traffic identification.The experimental results show that the proposed classification model can reduce the impact of conceptual drift,and the comprehensive prediction ability of the system and the prediction ability of a single application are improved.
Keywords/Search Tags:traffic classification, class imbalance, supervised learning, feature selection, ensemble learning
PDF Full Text Request
Related items