Font Size: a A A

Research On Traffic Anomaly Pattern In Intrusion Detection Based On Machine Learning

Posted on:2019-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:D Y FuFull Text:PDF
GTID:2348330542998870Subject:Information and Communication Engineering
Abstract/Summary:
While people are enjoying the great convenience brought by Internet technology and new types of online business,they also face the increasingly serious problem of network security.Due to the fact that firewall is unable to deal with all the network security problems alone,intrusion detection system,as a very important device to work with the firewall,has become one of the hot research directions of network security.Due to the weakness of traditional intrusion detection system based on expert system and rules,machine learning technology,which can be used to fit extreamly complex functions,become an excellent solution for intrusion detection system.However,there are two main factors affecting the robustness of the machine learning model for intrusion detection:one is the traffic class imbalance problem;the other is the non-identical distribution problem as a result of the time-varying network traffic distribution.In this thesis,the samples’ distribution in the feature space is declared the essence of many machine learning problems.This thesis presents a unique concept of "pattern" to describe,explain and apply the distribution to solve problems.Based on the idea of pattern,a multi-level and semi supervised machine learning framework for intrusion detection system(MSML-IDS)is proposed and includes four modules:pure cluster extraction,pattern discovery,fine-grained classification of unknown pattern and model updating.In the pure cluster module,this thesis defines the concept of "pure cluster pattern" and proposes a hierarchical semi-supervised k-means algorithm(HSK-means)to find out all the pure clusters;because the number of samples in pure cluster is so large and those pure cluster samples can be predicted precisely,that pure cluster drop out can reduce the class imbalance problem for the subsequent model.In the pattern discovery module,this thesis defines the "unknown pattern" and apply cluster based method and one class SVM based method to find those unknown patterns,respectively and then a test sample is sentenced to labeled known pattern or unlabeled unknown pattern.The fine-grained classification of unknown pattern module achieves fine-grained classification for those unknown pattern samples.The model updating module provides a mechanism for retraining at regular intervalsIn this thesis,the KDDCUP99 data is selected to create identical distribution dataset and non-identical distribution dataset.For the model evaluation,besides the general indicators,this thesis also defines some characteristic indicators such as "basic accuracy rate","known pattern accuracy rate" and so on.This thesis studies the relationship between those characteristic indicators and the overall accuracy rate.Based on this relationship,a preference for the selection of model training methods and parameters is generated.The experimental results show that MSML-IDS has a strong robustness on both identical distribution dataset and the non-identical distribution dataset.For the identical distribution dataset,the overall accuracy rate of MSML-IDS reaches 99.95%.For the non-identical distribution dataset,the overall accuracy rate of MSML-IDS reaches 96.6%;F1 score for each category improve when compared with baseline model;F1 score of the "mouse traffic" Categories has a significantly increase;known pattern accuracy rate reaches a high score of 99.3%and the results of unknown pattern recognition are also expected.
Keywords/Search Tags:Machine learning, Semi-supervised Learning, Pattern, Intrusion Detection, KDDCUP99
Related items