Font Size: a A A

The Design And Implementation Of Anomalous Network Traffic Detection System Based On Spark

Posted on:2020-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZhouFull Text:PDF
GTID:2428330572476388Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Network security has always been one of the most concerned issues in the world.With the rapid development of Internet technology,the network security environment has also deteriorated drastically.It's an urgent and meaningful thing to design network anomaly traffic detection system for the current network environmentThis paper designs an anomalous traffic detection system based on spark,which collects the data packet's characteristics from hosts,and then predicts on the and displays the results to inform users whether there is anomalous traffic.The system is a quasi-real-time streaming system,which is mainly divided into five modules:Feature collection module,Gather Module,Prediction Module,Report Module,and Training Model Module.1.The Feature Collection Module collection the 31-dimensional features from IP packets based on JnetPcap technology,which has the advantages of cross-platform and can work under both Windows and Linux systems.The collected features are divided into three categories,the basic features of the tcp connection are 12-dimensional,the statistics of the time-based TCP connection are 8 dimensions,and the TCP connection statistics based on the host are 11 dimensions.The features will be sent to the Gather Module after collections.2.The Gather Module is responsible for gathering the features,which will be initially filtered and sent to the kafka topic;The Prediction Module will consume the topic and then predict the traffic.3.The Prediction Module contains two models.Among them,KMeans_RandomForest_Model is a supervised learning model,the advantage of this model is that the prediction accuracy is high;in addition,the model is a cascade model,including the algorithms K-Means and Random Forest.Streaming_KMeans_Model is implemented by Streaming K-Means algorithm which is the unsupervised learning algorithms.The advantage of this model is that it does not require labeled data.As the model making prediction on data,the model will aslo been trained;And the model provides a parameter named declining factor,which make the closer data has the greater the impact on the model.In addition,new features constructed in this paper were used to design the above two models.4.The Report Module shows the predicted results;the display includes all traffic table reports,anomalous traffic table reports,a histogram of weights in Streaming_KMeans_Model,and a pie chart according to the network traffic predicted values.5.The training module is different.The module does not work on the work-flow.The module uses the training data set to train the model to support the prediction module.Finally,this paper uses part of the IDS2017 data(Intrusion Detection Evaluation Dataset)to verify the validity of KMeans_RandomForest_Model and Streaming_KMeans_Model.The KMeans_andomForest_Model model achieves 97.4%accuracy,and the unsupervised model Streaming_KMeans_Model achieves 70.2%accuracy.In addition,this paper builds a hadoop and spark system based on three virtual machines,and conduct experiment.Experiments result show that:1.The modules are coordinated with each other,and the system is available.2.As the number of virtual machines used increases,the processing speed of predictions increases.
Keywords/Search Tags:anomaly traffic, K-Means, Random Forest, Spark-Streaming, streaming calculation
PDF Full Text Request
Related items