Font Size: a A A

Research And Implementation Of Traffic Classification Based On Deep Flow Inspection

Posted on:2016-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2308330503478054Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Internet traffic classification can not only help the Internet service provider to guarantee the quality of service, but also provide the effective supervision and management of the network and ensure network security. With the rapid development of network, new applications emerge one after another incessantly, especially the wide spread use of private protocol and encryption application, make the applicable scope of DPI method more and more small. DFI method uses the statistical characteristics of flows to identify different applications, without analysizing the application layer payload. Therefore, the processing spreed is fast and it is still effective for encrypting messages and privacy protocol.Additionly, DFI method does not need additional overhead equipment. At present, Internet traffic classification using machine learning based on DFI is a promising alternative. However, this method treats high overall accuracy as the optimization goal, which ignores the imbalance property of flows, the traffic classification performance biases towards the majority classes and ignore the minority classes. In the Internet traffic, some minority classes contain signaling flows or real-time communication flows, and their classification performance influences communication quality and user experience etc. Some minority classes own a number of bytes, and their classification performance affects network planning or bandwidth resources allocation etc.The research focuses on the above goal based on DFI (Deep Flow Inspection). The main contents are as follows:1. The influence of feature selection for traffic classification. Feature selection is important for traffic classification based on DFI. With the redundant and irrelevant features, traffic classification has high computational complexity and space complexity. Feature selection algorithm can distinguish better features by evaluation strategy and improve the accuracy of classification.This paper proposes hybrid feature selection algorithm based on selective ensemble and improved Sequential Forward Selection and compares with FCBF,InfoGain,GainRatio,Chi-square,and Consistency. The experiment shows the hybrid feature selection can better distinguish the correlation between feature and class.2. The algorithm model based on cost-sentitive. Due to the imbalance property of network traffic, the traffic classification performance biases towards the majority classes and ignore the minority classes. In order to improve the classification performance of minority classes, this paper proposes a cost-sentitive model. Firstly, it uses SMOTE resampling to balance the majority classes and minority classes, then uses AdaCost with weighted misclassification cost matrix. Compared with C4.5, the experiemnts show that this model can improve the flow accuracy and byte accuracy of minority classes.3. The algorithm model of multiple classifiers based on cost-sensitive. As the characteristics of network traffic change with time and environment, the stability of classification method using machine learning is difficult to maintain. In order to improve the adaptive ability of classifier, accuracy weighted ensemble learning is proposed. The experimental results show that the algorithm exhibits better classification performance and generalization ability in the concept drift. In order to further improve the classification performance of minority classes in the concept drift environment, accuracy weighted ensemble learning based on cost-sentitive model is proposed, which consists of two parts. The first part is hybrid feature selection algorithm to obtain stable optimal feature subset The second is accuracy weighted ensemble learning based on AdaCost with weighted misclassification cost matrix. The experimental results show that the model can effectively improve the flow accuracy and byte accuracy in the concept drift environment.
Keywords/Search Tags:traffic classification, imbalanced property, feature selection, cost-sentitive, ensemble learning
PDF Full Text Request
Related items