Font Size: a A A

Network Traffic Classification Based On Clustering And Noisy Data

Posted on:2021-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2438330611954094Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Accurate classification of network traffic is critical to the field of network security.There are many applications that use dynamic ports and encryption algorithms to avoid detection.Therefore,there are major shortcomings such as port-based and load-based classification.Machine learning algorithms have been applied to the field of traffic classification.At the same time,the previous traffic classification study was only a separate clustering or classification analysis of data.The results of clustering did not study how to label clusters quickly and efficiently.At the same time,in the supervised method of traffic classification,many researches mainly focus on how to improve the accuracy of classification algorithms,and use the authoritative or self-collection dataset.However,both of them require a lot of manual labor to mark the data.For the above problems,we propose a network traffic classification method based on clustering and noisy data that combines unsupervised and supervised machine learning.To solve the clustering problem,the original data set are preprocessed by PCA and Gain Ratio dimensionality reduction algorithms that this paper has proposed.and uses K-Means,Canopy,and Farthest First three hard clustering algorithms to evaluate the dimensionality-reduced data to study data dimensionality reduction and cluster The impact of the algorithm;for the automatic labeling problem,firstly this paper proposes a method of minimizing the clustering results that using the Resample to label the clusters with noisy labels,and compare the sampling accuracy with the pure manual labeling in real time.Finally we use J48,Random Forest,Naive Bayes,Bayes Net,SMO five classification algorithms to train a noisy classifier based on noisy data to evaluate the effect of noisy labels.In order to ensure the ground truth for the experiment,this paper uses two public data sets.The results show that the above method is suitable for a variety of data sets,and the accuracy of clustering has been effectively improved by reducing the dimension of instances.At the same time,the automatic labeling task reaches the minimum manual labeling requirement.The trained noisy classifier can effectively identify network application classes.
Keywords/Search Tags:Traffic classification, Machine learning, Clustering, Classification, Automatic labeling
PDF Full Text Request
Related items