Font Size: a A A

Statistical Traffic Classification Method And Application With Mislabelled Training Samples

Posted on:2017-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:B F WangFull Text:PDF
GTID:2308330503483636Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the sharply development of Internet, the network management and network security arouse great concerns of governments, enterprises and individuals. Online applications in obtaining information and providing entertainment make people’s life more convenient.The volume of network traffic are increasing dramatically, due to the opening of the Internet.How to effectively analyze these network traffic will influence the performance of network management and network safety controlling. Traffic classification is a fundamental tool for modern cyber management and security. The traditional traffic classification technologies are facing significant challenges. For example, the port-based technology is used to identify the application according to the conventional port numbers. However, many current applications are applying the dynamic port into the packets. The payload-based technology suffers from two critical of problems, encrypted applications and privacy constraints. So this paper will combine the statistical characteristic with machine learning into the process of traffic classification to reduce the influences of the above-mentioned technologies. Recent researches often assume that training data are clean. However, in reality, network traffic contain mass noise traffic(mislabelled training data). Their classification performance is severely compromised when are contaminated with different noise, especially incorrect class labels. There is one to work about evaluating the performance of typical machine learning algorithms with mislabelled training data for providing a support for academia and practitioners in choosing proper traffic classification methods in real-world scenarios. Based on the above, we mainly carry out the following two tasks:(1) Regarding the mislabelled traffic classification problem, the performance of several mainstream classification methods will be evaluated by experiment and analysis for providing supports to related works. To guarantee multi-dimensional performance evaluation of the algorithms, we use overall accuracy, F-measure, unclean overall average accuracy, overall noise tolerant ratio, unclean per-class average accuracy and per-class noise tolerant ratio as measure metrics. We varied the size of clean training dataset to observe the performance of different algorithms. And we vary the size of mislabelled training dataset to evaluate the tolerant noise ability of different algorithms. The result shows that mislabelled traffic affect the robustness of the classification model and reduce the classification accuracy. Some algorithms can still remain a good performance under the extremely difficult circumstance with mislabelled training data, but most of the algorithms are greatly affected by the mislabelled traffic.(2) According to the above analysis, we put forward a new noise-resistant statistical traffic classification scheme. The scheme incorporate the methods of cleaning noise and tolerating noise into the process of classification. The two methods are based on the combined classifiers and collaborative filtering. Additionally, the tolerating noise phase employ the method that randomly sample wkpercentage from training dataset to build robust training dataset.Finally, the scheme uses the random forest algorithm to train robust classifier. The performance of the proposed method is measured. And, the performance of the method is evaluated by comparing to the mainstream methods and network traffic classification using correlation information. The results show that the proposed method outperforms the other methods and maintains a good performance under the extremely difficult circumstance.According to the performance analysis of mislabelled traffic classification and the new noise-resistant statistical traffic classification scheme, we hope that these works provide a valuable reference for network traffic classification, and effectively improve the classification performance of mislabelled traffic.
Keywords/Search Tags:Traffic classification, Machine learning, Traffic noise, Noise-resistant, Network management and security
PDF Full Text Request
Related items