Font Size: a A A

Design And Implementation Of Distributed Network Intrusion Detection System Based On Spark

Posted on:2022-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2518306605989889Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The continuous construction and improvement of network infrastructure has gradually reduced the development cost and threshold of network attacks.At the same time,the continuous growth of the network scale,the increasingly complex and diverse structure,and the continuous update and iteration of hacker intrusion methods have caused huge challenges in network intrusion detection.At present,the intrusion detection system based on pattern matching has been developed,but it is somewhat inadequate when facing the new largescale high-speed network traffic environment,and it is difficult to realize real-time and accurate network intrusion detection.Intrusion detection methods based on machine learning have high accuracy,strong scalability,and can predict unknown data categories,which are the research hotspots in the field of intrusion detection at this stage.The Spark distributed framework based on in-memory computing has the characteristics of high performance,low latency,and high fault tolerance.It is especially suitable for scenarios that require a large number of iterative computing.It has natural advantages in machine learning and has been widely used in cloud computing and big data application.In order to cope with the challenges brought by the complexity and scale of the network data to the intrusion detection system,this thesis improves the random forest algorithm and builds the intrusion detection model,and designs and implements it under the Spark distributed computing framework based on memory computing A distributed network intrusion detection system based on Spark.The main work of this thesis is as follows:First,in view of the problem that the traditional random forest algorithm classification accuracy rate is not high due to the high characteristic dimension of network traffic data,this thesis improves the random forest algorithm through the hierarchical feature selection strategy based on Relief F and decision tree weighting,and builds the intrusion based on the improved algorithm.Check the model.Through the experimental comparison with the support vector machine algorithm and the traditional random forest algorithm,the results show that the intrusion detection model obtained by the improved random forest algorithm in this thesis has a positive effect on improving the accuracy and detection rate of intrusion detection.Second,for the processing of large-scale network traffic data,this thesis parallelizes and optimizes the construction of the intrusion detection model under the distributed computing framework Spark,which reduces the time of decision tree generation and improves the efficiency of model construction.And realized the distributed calculation of the model,which further improved the efficiency and performance of intrusion detection.Third,based on the constructed intrusion detection model,based on the distributed computing framework Spark,a distributed network intrusion detection system including network traffic collection module,data transmission and processing module,intrusion detection module and intrusion management module is designed.Fourth,this thesis has carried out detailed design and realization of each functional module of the system.Through the network flow collection module,distributed collection of network flow data;through the data transmission and processing module,distributed transmission and preprocessing of network flow data;through intrusion detection Module to obtain network intrusion classification results;through the intrusion management module,intrusion warning information is visualized on the front end.Finally,the system test shows that compared with the traditional pattern recognition-based intrusion detection system,the network intrusion detection system implemented in this thesis can meet the needs of near real-time processing and efficient detection of large-scale network data.
Keywords/Search Tags:Network intrusion detection, Spark, Random forest, Stratified feature sampling, Distributed
PDF Full Text Request
Related items