Font Size: a A A

Research On Anomaly Detection Scheme For Unlabeled Traffic Data

Posted on:2020-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:J T ChenFull Text:PDF
GTID:2518306308467024Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present,many countries and organizations are aware of the importance of protecting the security of computer networks,and the research of network traffic anomaly detection is an important issue in network security.In the research,some scholars have proposed some detection methods based on machine learning,but these methods have several problems:the clustering accuracy caused by the imbalance of positive and negative samples in the traffic data set is not high enough;the clustering label of unlabeled traffic data The process is not detailed enough;the anonymous traffic data set is not good enough for feature extraction under anonymous rules.In response to these problems,the optimized solution for network traffic anomaly detection based on machine learning in this paper will better allocate the labor and material costs in the detection to the traffic data that may appear anomalous,which improves the detection efficiency and saves costs.The research has improved the sample clustering scheme,sensitive feature selection,classifier optimization,unsupervised detection scheme and other issues in the process of network abnormal traffic detection.The solution was verified on the open source KDDCup99 data set.The effectiveness of traditional solutions has been improved.The main research results of this article are as follows:First,in view of the problem that the clustering accuracy caused by the imbalance of positive and negative samples in the flow data set is not high enough,a flow sample clustering scheme based on WCS value and BCS value is proposed.By introducing WCS value and BCS value information,the traditional K-means clustering algorithm is improved with new class attributes;support vector machine and SMOTE are introduced to adjust the classification hyperplane,and the confusion points in the sample are removed by the idea of clustering and split fusion.Under the RF classifier,compared with the previous research scheme,a better AUC average result and time overhead are obtained.In the four sub-datasets,the measurement results increased by an average of 0.023.Second,to solve the problem that the anonymous traffic data set is not good enough for feature extraction under anonymous rules,a scheme for selecting traffic-sensitive features based on improved OLA-GA is proposed.The algorithm introduces the Gini Index indicator and the chi-square test idea to improve the calculation process of feature contribution,and then integrates with OLA decision rules in the subset search process,and finally introduces the GA fitness function to optimize the machine learning classification used in the experiment Device.Comparing the methods used by the predecessors,whether it is on each applied data set or each K threshold of the OLA model,there are good detection results and stability.The average improvement result of OLA-RF is 0.021,and the average improvement result of OLA-GA is 0.027.Third,for the problem that the clustering process of unlabeled traffic data is not detailed enough,an unsupervised traffic detection scheme based on SM-CAL is proposed.Using the idea of Kim et al.'s CAL model,the degree of violation is mapped to continuous values through the Softmax function to improve the labeling result of the Label attribute of the model.By improving the marking process,the selection of measurement units and the screening process of experimental samples are improved.The final detection results are superior to the methods studied by the predecessors in each data set.In the process of improving SM-CAL,we also obtained the threshold N applicable to the data set through experimental comparison,and improved the detection scheme.The threshold value of 0.55 has better detection results than the default threshold of 0.5.
Keywords/Search Tags:Network traffic detection, Machine learning, Sample clustering, Feature selection, Unsupervised learning
PDF Full Text Request
Related items