Network Traffic Classification Based On Clustering And Noisy Data

Posted on:2021-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2438330611954094

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Accurate classification of network traffic is critical to the field of network security.There are many applications that use dynamic ports and encryption algorithms to avoid detection.Therefore,there are major shortcomings such as port-based and load-based classification.Machine learning algorithms have been applied to the field of traffic classification.At the same time,the previous traffic classification study was only a separate clustering or classification analysis of data.The results of clustering did not study how to label clusters quickly and efficiently.At the same time,in the supervised method of traffic classification,many researches mainly focus on how to improve the accuracy of classification algorithms,and use the authoritative or self-collection dataset.However,both of them require a lot of manual labor to mark the data.For the above problems,we propose a network traffic classification method based on clustering and noisy data that combines unsupervised and supervised machine learning.To solve the clustering problem,the original data set are preprocessed by PCA and Gain Ratio dimensionality reduction algorithms that this paper has proposed.and uses K-Means,Canopy,and Farthest First three hard clustering algorithms to evaluate the dimensionality-reduced data to study data dimensionality reduction and cluster The impact of the algorithm;for the automatic labeling problem,firstly this paper proposes a method of minimizing the clustering results that using the Resample to label the clusters with noisy labels,and compare the sampling accuracy with the pure manual labeling in real time.Finally we use J48,Random Forest,Naive Bayes,Bayes Net,SMO five classification algorithms to train a noisy classifier based on noisy data to evaluate the effect of noisy labels.In order to ensure the ground truth for the experiment,this paper uses two public data sets.The results show that the above method is suitable for a variety of data sets,and the accuracy of clustering has been effectively improved by reducing the dimension of instances.At the same time,the automatic labeling task reaches the minimum manual labeling requirement.The trained noisy classifier can effectively identify network application classes.

Keywords/Search Tags:

Traffic classification, Machine learning, Clustering, Classification, Automatic labeling

PDF Full Text Request

Related items

1	Research On Traffic Classification Algorithms Based On Machine Learning
2	Design And Implementation Of Image Labeling System Based On Machine Learning
3	Study And Implementation Of Network Traffic Classification Base On Machine Learning
4	Research On Some Key Issues For Classification And Identification Of Network Traffic
5	Design And Implementation Of Network Traffic Classification System Based On Machine Learning
6	The Research Of Network Traffic Classification And Its Algorithms
7	The Research Of P2P Traffic Classification Based On Machine Learning Algorithms
8	The Research Of P2p Traffic Classification Based On Machine Learning Algorithms
9	Research And Implementation Of Traffic Classification Platform Based On Machine Learning
10	Research On Hierarchical Traffic Classification System For Campus Network