Font Size: a A A

The Research Of Real-time Network Traffic Anomaly Detection Based On Spark Technology

Posted on:2017-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhouFull Text:PDF
GTID:2348330488988809Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
After Hadoop technology in cloud computing, a new generation of open source technology Spark is proposed in the field of parallel computing,which is based on memory.This technology has an unparalleled advantage in machine learning,especially suitable for the algorithm with multiple iterative computing,and has a wide range of applications in the query interactive, cloud computing,graph computing and other fields.Because Spark technology has a very good fault tolerance system and scheduling mechanism, and can make sure the system with steadily running. In same time, it is a collection of computing framework can combined with SQL, machine learning, graph computing, streaming computing and other functions in a project, which has a very good easy-using. At present, Spark technology has built its entire big data processing system and has own feature in all respects, such as streaming processing, graph technology, machine learning, NoSQL query. In addition, Spark technology uses full stack to solve the key issues of cloud computing in dealing with data, which makes Spark became a hot research in the field of cloud computing.The detailed elaborates the domestic and foreign research on the Spark, appointing to the existing difficulties in the present, the research of existing application issues of network traffic anomaly detection technology, which based on the various components and application in the framework of Spark. There are two main contributions of this paper as follow:Firstly this paper aimed on the application of network traffic anomaly detection technology in Spark-platform, and using MLlib to algorithm library, Streaming K-means and Random forest is the one or two level model of intrusion detection and using detection of network traffic data in different stages, respectively. In addition, the principle of K-means algorithm is introduced and optimized the algorithm of network traffic anomaly detection. Using Z-score filter edge information, and the optimal model of K-means is chosen based on Entropy information as the key point of the first level network traffic anomaly detection. The output of the first level detection model is used as input, and the second level detection model is obtained by the algorithm of random decision forest.Secondly this paper makes a comparative test about those algorithms. KDD99 as the testing data set, judge K-means optimal model based on entropy information. It is proved that the model of prediction accuracy and entropy information are inversely proportional to the K value by the cross examination, and the K value is 60, the K-means model is optimal. The random forest and decision tree prediction abnormal data of experiments in different combinations of hyper-parameters. The experimental results show that random forest can forecast more than 98% of the abnormal results, and the detection accuracy of abnormal data can be obtained by using two level anomaly detection model than traditional model.
Keywords/Search Tags:Spark, Machine Learning, Network Traffic Anomaly Detection, K-Means, Random Forest
PDF Full Text Request
Related items