Font Size: a A A

Research On Network Anomaly Detection Method Based On Semi-supervised Learning Strategy

Posted on:2020-09-09Degree:MasterType:Thesis
Country:ChinaCandidate:D M LiFull Text:PDF
GTID:2428330596968139Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As network attacks become more frequent and severe,network security becomes more and more important.In recent years,machine learning-based method is playing an important role owing to the performance degradation of traditional approaches.However,in practice,it is exceptionally challenging to obtain a complete labeled dataset,because the task is expensive and time-consuming,of which requires specialized technicians to label the dataset accurately,and it is relatively easy to obtain the unlabeled dataset.However,the traditional supervised learning algorithm fails to make good use of the large number of easily available unlabeled data in the case of obtaining only a small amount of labeled data,which is not good in practical applications.In turn,a semi-supervised learning algorithm,which considers both labeled and unlabeled samples,can improve learning effectiveness significantly,making the algorithm more suitable for practical network applications,so research on semi-supervised methods is necessary.In this paper,we propose a semi-supervised model to improve the classification ability and classification effect of the classifier by using a large number of existing unlabeled data sets.The disagreement-based approach is an important branch of semi-supervised learning,from the initial Co-training algorithm for multi-view learning to the cross-validation Co-training algorithm to the current Tri-training algorithm.The divergent-based approach has been validated to be effective.The Tri-training algorithm is a semi-supervised learning algorithm with strong generalization ability,which can effectively improve the accuracy of detection.However,incorrectly labeled data introduces noise,and the negative impact can offset the benefits of using large amounts of unlabeled data.The traditional Tri-training algorithm is improved in this paper.By estimating the confidence of the unlabeled data,the confidence filtering is performed to reduce the possibility of the error label data being put into the labeled data.Confidence is used as the weight of each data to reduce the impact of erroneousdata in the model,thus improving the accuracy of the algorithm.Experiments have shown that the improvement of the Tri-training is effective,achieving a better detection rate and a faster detection speed.As one of the basic machine learning classification methods,decision tree has the characteristics of high speed and high accuracy.The actual network detection requires high detection speed.Therefore,this paper uses the decision tree algorithm as the basic classifier.Combined with the ensemble learning algorithm,the classifier is used as the weak classifier of the improved Tri-training algorithm to further reduce the proportion of error-labeled data.After the improved Tri-training algorithm is trained,the final classification model is generated,and ultimately achieve accurate and rapid classification of network traffic.Experiments show that the system proposed in this paper performs well in network traffic detection.Even in the case that the training data set has only a small amount of labeled data,it can achieve good detection results.Compared with the semi-supervised detection model proposed in the previous work,the system performs better in accuracy and algorithm time consumption on the NSL-KDD dataset;on the Kyoto dataset,the system has achieved a good balance between accuracy and time cost.
Keywords/Search Tags:Tri-training, Semi-supervised, Machine learning, Network security, Decision tree, Traffic classification
PDF Full Text Request
Related items