Font Size: a A A

Classification Of Abnormal Data Flow Based On Machine Learning

Posted on:2020-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2428330590451150Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the increase of the frequency of network use,the data traffic has increased dramatically,and many attacks against various network services have emerged,which has brought serious challenges to the network security.Therefore,the identification and classification of malicious data packets in data traffic is the focus of defense technology.Firstly,this paper introduces the principle and process of port-based identification and deep packet resolution,and explains that its own defects lead to the failure to meet the application needs of the existing network.The classification and recognition technology of packet based on machine learning method,including Naive Bayesian algorithm,C4.5 Decision tree algorithm,support vector machine(SVM)algorithm,K-Means clustering algorithm,etc.,to sort out the demonstration process of the algorithm.Then the original data packet is captured by the pcap library function,and the flow attributes suitable for machine learning are selected as the sample feature set.In this paper,the classical KDD99 data set is used,and the classification accuracy of the test set is used as the result standard.On the basis of the existing algorithms,this paper proposes two improved strategies,one is to improve the attributes of training samples on the basis of weighting,and the other is to combine the advantages of K-Means clustering and support vector machine(SVM).The first one is an improved method for the continuity and correlation between network packets.In the network environment,the packets exist in the form of streams,not independent,and some attributes of the same type of packets may be the same.Therefore,the weight is taken as the proportion of the number of categories to which each attribute belongs to the total number of instances.Think of the weight as an impact factor,delete the attribute whose weight is basically 0,retain the attribute with multiple values,and reduce the training complexity.The experimental results show that the training speed can be improved obviously while the classification accuracy is basically unchanged.The second improved method is the combination of K-Means clustering and support direction.In the synthesis model of themachine,after clustering the data set to be tested by K-Means algorithm,several clusters centered on the centroid of the cluster are obtained,and then the initial clustering data set is used to train the classifier of the support vector machine.This method can not only avoid the time-consuming artificial extraction stage for sample features in SVM training,but also make use of the advantages of unsupervised clustering algorithm in rapid training.The experimental results show that this model can effectively increase the classification accuracy of SVM algorithm and shorten the training time.Through the research in this paper,it is proved that the machine learning method has a good effect in the application of traffic classification.At the same time,the improved method can overcome the shortcomings of the original machine learning algorithm and achieve more efficient classification.
Keywords/Search Tags:traffic classification, machine learning, impact factor, integrated model
PDF Full Text Request
Related items