Network traffic classification refers to a means of monitoring the status of a network by analyzing traffic data,which can provide a basis for ne twork managers to make decisions.The backbone network is the core component of the Internet.However,the high transmission speed and the rapid evolution of backbone network traffic bring new challenges to traffic classification.Most of the existing traffic classification methods are based on full traffic for feature extraction,but extracting features from large-scale backbone traffic in a full traffic way will consume a lot of time and resources.Secondly,the backbone traffic classification work needs to complete the classification of massive backbone traffic in a relatively short time and with limited memory,which requires the time complexity and space complexity of the classification methods used to be low enough.In addition,with the rapid evolution of backbone traffic,the mapping relationship between feature vectors and classification labels will change,i.e.,network flow concept drift occurs,leading to the performance degradation of the classification model.To address the above challenges,this paper proposes a traffic classification technology scheme for the backbone network.This technical scheme can accurately classify sampled traffic in the backbone network and adaptively update the classification model when concept drift occurs.The main work accomplished in this paper is as follows.(1)We propose a classification method based on a batch classifier to classify backbone network traffic.The classification method includes a Multiple Counter Sketch(MC Sketch)and a Batch Classifier based on Agglomerative Clustering(BCAC).Firstly,the MC Sketch is used to extract features from the sampled data streams quickly.Subsequently,the BCAC performs unsupervised clustering of the traffic features in a reasonable time and with limited memory.Finally,a supervised machine learning algorithm is used to construct a classification model based on the traffic data that have been automatically labeled in the clustering results.The experimental results show that our classification method can quickly extract features from the sampled traffic,and our model’s classification accuracy can reach 96.3% even if the sampling rate is 1:1024.Moreover,the classification method in this paper can adaptively reduce the space complexity and time complexity of the BCAC model by reducing its block size.(2)We propose a concept drift solution for network streams based on batch updating.This solution consists of a concept drift detector and a batch updater.The concept drift detector determines whether a network traffic concept drift is occurred by monitoring the classification results and triggers adaptively the update process of the classification model based on the number of the drift traffic samples.The batch updater performs batch updating based on the drift traffic samples and historical traffic samples to obtain the updated clustering samples and then trains the updated clustering samples using a supervised machine learning classification algorithm to complete the updating of the classification model.The experimental results show that our solution can successfully detect the occurrence of concept drift,the average accuracy of the resulting updated classification model is 96%,and the total time required to complete the model update by the batch update method of this paper is only 52.5% of that of the one-time update method.(3)We design and implement a traffic classification prototype system for the backbone network.In this paper,the design and implementation of the prototype system are mainly introduced in terms of feature extraction,classification model generation,concept drift detection,classifier batch update,and system presentation.The experimental results show that our system’s time consumption and memory occupation are low enough to perform the classification of backbone network traffic,and the classification performance and processing speed of our system are better compared with similar work. |