Font Size: a A A

Research On The Technology For Network Traffic Identification Based On Machine Learning

Posted on:2020-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z C SunFull Text:PDF
GTID:2428330599952878Subject:engineering
Abstract/Summary:PDF Full Text Request
With the surge of Internet users and burgeoning expansion of network,the types of network services have become more complex.In order to provide a secure and reliable network environment,it is urgent to manage the network efficiently.As the core technology of network management,traffic identification can provide powerful data support for network behavior analysis,bandwidth resource allocation,network operation management and network architecture improvement.However,the traditional traffic identification methods based on port number,deep packet detection and behavior characteristics can no longer meet the current needs.Nowadays,with the rapid development of artificial intelligence and the maturity of machine learning,artificial intelligence has been widely used in various fields.Using machine learning algorithms to identify traffic can not only ensure the accuracy of identification,but also the robustness of the system.Therefore,the research of network traffic identification technology based on machine learning has important academic value and promising prospect of application and spreading.The scale of network traffic data is huge and there are enormous characteristic attributes.In order to improve the efficiency of identification,feature selection is required to reduce feature set.At the same time,the class imbalance of business traffic also poses great challenges to the identification.This thesis proposes a multi-stage feature selection algorithm based on weighted symmetric uncertainty.The algorithm filters features stage by stage.Firstly,based on weighted symmetric uncertainty,the weights of the majority and minority classes are balanced to alleviate class imbalance and eliminate irrelevant features.Then,the correlation degree between features is estimated based on Pearson Correlation Coefficient,and redundant features are screened out.Finally,the optimal feature subset is discovered based on tabu search strategy.Theory and simulation experiments show that the algorithm manages the rapid dimensionality reduction of network traffic.Compared with the feature selection algorithm based on machine learning proposed in recent years,it has relatively great advantages in feature dimension,classification speed and recognition accuracy.Although the feature selection effectively reduces the dimensionality of features and simplifies the machine learning task,the stability of network traffic identification using a single classifier is slightly not well secured.Moreover,network traffic ofte n produces concept drift over time,which is a great challenge for traffic identification.In order to reduce the influence of concept drift on system recognition performance,this thesis proposes a multi-classifier ensemble learning algorithm.Based on the Bagging integrated learning framework,the algorithm divides the data stream into continuous sub-blocks,introduces adaptive window mechanism to detect conceptual drift in network traffic,adjusts the weights of each base classifier in the fusion classifier dynamically,updates the system model with incremental learning strategy,combines and optimizes the classification results to output the final prediction application category.In this thesis,the optimal size of network traffic data blocks and the optimal number of base classifiers are determined based upon experiments.Theory and simulation experiments show that compared with single classifier algorithm and conventional Bagging algorithm,this algorithm can effectively deal with concept drift and improve the accuracy and stability of traffic identification.
Keywords/Search Tags:Traffic Identification, Imbalanced Class, Feature Selection, Concept Drift, Ensemble Learning
PDF Full Text Request
Related items