Font Size: a A A

Research And Implementation Of Network Traffic Anomaly Detection Based On Spark Platform

Posted on:2022-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2518306506463474Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the network security environment is also facing serious problems.Network traffic anomaly detection technology is the foundation of network security assurance and an important part of network security research.The emergence of machine learning provides a new solution for network traffic anomaly detection.The traffic anomaly detection can be solved by classification algorithms in machine learning,by constructing a classification model,and then using the classification model to discriminate and classify traffic data.However,in today's big data era,network traffic is showing explosive growth,and the traditional machine learning environments can no longer solve the anomaly detection of massive data.In addition,the feature dimension in network traffic data is relatively high,and there are many redundant features,which seriously affect the performance and efficiency of traffic anomaly detection.In response to the above problems,this thesis proposes a network traffic anomaly detection method based on the Spark platform based on feature selection technology and classification algorithms in machine learning,with the purpose of protecting network security.The main research contents of this thesis are as follows:(1)A feature selection method combining mutual information and firefly algorithm is proposed.This method first calculates the mutual information between each feature and the class label according to the degree of association between the feature and the class label,and sorts the mutual information value from large to small,and selects an excellent feature subset.Then the firefly algorithm is used to search for the best feature subset from the original feature set,and an adaptive strategy is used to add and delete the iterated features.Finally,a voting strategy is adopted for the feature subsets obtained by these two methods to get the final feature subset.The experimental results show that this feature selection method can effectively improve classification performance and reduce model detection time.(2)A method of network traffic anomaly detection based on weighted voting random forest is proposed and implemented in parallel on the Spark platform.This method first studies the weighted voting random forest model for network traffic anomaly detection,and then implements the algorithm in parallel on the Spark platform.The algorithm can enhance the impact of decision trees with strong classification capabilities on classification,while reducing the impact of trees with poor classification capabilities on classification.The experimental results show that compared with the original random forest algorithm and other algorithms,the method proposed in this thesis has improved accuracy and F-measure(F1).In addition,compared with the random forest algorithm in a stand-alone environment,the proposed algorithm greatly reduces the data processing time,in the era of big data,it can well solve the problem of large-scale network traffic anomaly detection.(3)Designed and implemented a prototype system for network traffic anomaly detection based on the Spark platform.Data collection,preprocessing and feature selection are mainly performed.The processed data is stored in a distributed file system,and then a weighted voting random forest algorithm is used to detect abnormal traffic on the Spark platform,and the final detection results are displayed on the web interface,the system can well realize the detection of large-scale traffic data.
Keywords/Search Tags:Spark, Feature Selection, Mutual Information, Firefly Algorithm, Random Forest, Traffic Anomaly Detection
PDF Full Text Request
Related items