Font Size: a A A

Research On Machine Learning-based Early Classification For Network Flows

Posted on:2023-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiangFull Text:PDF
GTID:2568306836468754Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology and the improvement of network speed,the amount of instantaneously arriving network traffic will be very large,so only emphasizing the accuracy is not enough for network traffic classification;the real-time performance of classification also needs to be considered.Meanwhile,the timely and accurate traffic classification is very important for network resource management and providing users with a good Quality of Experience(Qo E).The early classification of network flow is completed in the early stage of flow,which can effectively deal with high-speed network environment and has good real-time performance.Therefore,the early classification of network flows has important research and application value.The main work of this thesis includes the following three parts:By analyzing the characteristics of the size and arrival order of network packets,a conditional frequency and rate sequence is proposed.First,the original packet sequence is divided into an upstream packet sequence and a downstream packet sequence;then,the packet size levels of the upstream and downstream packet sequences are divided by equidistant binning;finally,calculate the times that the front packet is level i,the rear packet is level j,and both of them arrive adjacently in a certain direction,which is recorded as the condition frequency;the packet size sequence in the same direction is divided by the time interval sequence,which is recorded as rate sequence.The experimental results show that conditional frequency and the statistical features of rate sequences can effectively improve the recognition accuracy of early flow classification.Combining time complexity analysis and correlation analysis for feature selection.First,in order to improve the real-time performance of classification,it is necessary to shorten the time of feature extraction as much as possible;Therefore,the time complexity analysis is performed on the extracted network flow features,and the features with smaller time complexity are selected;then,in order to reduce the feature dimension and redundancy,the feature correlation analysis is carried out by combining the filter and embedding feature selection methods,and the optimal feature set is selected according to the influence of the features on the model accuracy.Experiments show that this feature selection method can effectively reduce the time of feature extraction and improve the real-time performance of classification.To achieve higher classification accuracy,two frameworks for early traffic classification is designed.One is based on a cascade structure.First,the video vs.non-video binary classification is completed by using the packets that arrive early,and the subsequent multi-classifiers use the subsequent packets to complete their respective multi-classification.In addition,a streaming computing method is also introduced to perform feature computing to speed up the computation speed.At the same time,the latter stage of this method does not need to save the previous packets to save storage space.The other framework for early traffic classification is based on multi-segment voting.This method firstly extracts three consecutive packet sequences of the same number from the same flow,then extracts pre-selected features,and inputs them into the same random forest model for classification.The classification results of the three packet sequences are voted to make the final prediction result.Experiments show that the classification accuracy of our method is better than that of existing methods.Finally,the impact of different hyperparameters on the performance of random forest classifiers is also analyzed.
Keywords/Search Tags:early traffic classification, conditional frequency, feature selection, hierarchical classification structure, multi-segment voting
PDF Full Text Request
Related items