Font Size: a A A

Research On Classification Of Unbalanced Network Traffic

Posted on:2019-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2428330548491231Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
When faced with multiclass unbalanced network traffic,the machine learning-based traffic classification methods often neglect the classification performance of the minority classes because of paying more attention to the overall classification accuracy.Although there are many resampling algorithms for imbalance,these algorithms are mainly aimed at imbalance of two classes and are difficult to apply directly to the complex multiclass imbalance in traffic classification.These algorithms focus on solving the problem of imbalance between classes,and lack consideration of the inherent imbalance that may exist in the network traffic.Besides,when the unbalanced data is classified,the typical integrated learning algorithm Adaboost.m2 is inevitably affected by multiclass imbalance,and the overall classification ability of the algorithm is limited.Aiming at these problems,this thesis proposes a resampling algorithm for solving multiclass imbalance in network traffic classification,and an improved algorithm for improving Adaboost.m2's performance for classifying unbalanced network traffic.The main work is as follows:(1)In-depth analysis of network traffic data from two perspectives:apparent imbalance and internal imbalance.At the aspect of appearance,we focus on analyzing the characteristics of imbalance from flows and bytes.At the inner level,we study the potential in-class sub-concepts,overlap of samples and noise of samples through the feature spatial distribution of the network traffic.Then,through the experiments and analysis,this thesis studies the correlation between multiclass imbalance characteristics and the performance of network traffic classification.(2)At the aspect of resampling algorithm,an algorithm named HMMS(Heuristic Multiclass Hybrid Sampling)is proposed.This algorithm starts from the perspectives of inter-class imbalance and intra-class imbalance,the minority classes are oversampled by artificial synthesis at first,and then,the majority classes are heuristic undersampled by clustering for sub-class concepts,overlap of samples and noise of samples to construct a balanced dataset.The experimental results show that under the premise of ensuring the accuracy of the overall flow classification,the proposed algorithm not only greatly improves the flow F-Measure of some minority classes,but also significantly improves the overall flow G-Mean and the overall byte G-Mean.(3)At the aspect of integrated learning algorithm,an improved Adaboost.m2 algorithm,RBWS(Random Balance Sampling Based on Weighting)-ADAM2(Adaboost.m2)is proposed.At each iteration of Adaboost.m2,this strategy designs a random balanced resampling algorithm based on weighting to preprocess the training data to alleviate the impact of data imbalance on the classification performance of the minority classes and improve the generalization ability of the algorithm.The experimental results show that the proposed algorithm not only greatly improves the flow F-Measure of some minority classes,but also improves the overall flow G-Mean and the overall average flow F-Measure of the integrated classifier,which obviously enhances the overall performance of the integrated classifier.(4)Based on the RBWS-ADAM2 algorithm proposed in this thesis,a network traffic classification system is designed and implemented.A complete traffic classification function is realized by the network traffic capturing and data processing,training and classification.The system runs in the actual network environment,and compared with the Adaboost.m2 algorithm.The verification results show that this system has a high practical value.
Keywords/Search Tags:Network traffic classification, Multiclass imbalance, Resampling algorithm, Integrated learning algorithm
PDF Full Text Request
Related items