| The development of network attack technology has led to a sharp increase in the number of unknown network threats in cyberspace,posing a serious threat to the security of critical systems or infrastructure.However,existing methods are difficult to detect unknown network threats in large-scale network systems.To this end,unknown network threat detection methods based on machine learning are studied in this paper.Identifying the activities and intentions of unknown network threats from massive multi-source heterogeneous data is an important research content of cyber security situation awareness,which can help defenders identify potential risks and guide them to take timely response measures.This paper summarizes the current research in unknown network threat detection,points out the main problems existing in current methods,designs a datadriven framework for unknown network threat detection,and proposes a series of unknown network threat detection methods based on machine learning.It can significantly improve threat detection capabilities in large-scale network systems,shorten network attack discovery time,and effectively support network security situational awareness and decision-making.First,the background and current research of unknown network threat detection are analyzed,and the main problems are pointed out.On this basis,the main works and creations are as the following:1.In order to solve the problem of lack of implementation ideas and theoretical support for unknown network threat detection in large-scale network systems,a data driven machine learning based detection framework is proposed.First,a data classification method for network security detection is presented.And based on this,a detection framework is designed.Furthermore,the key issues faced by each detection method in the proposed framework are presented,providing theoretical support for unknown network threat detection;2.Aiming at the problem that existing methods are hard to identify few-shot and unknown malicious traffic in large-scale network systems,a new method based on contrastive learning for fine-grained unknown malicious traffic classification is proposed.It is based on Conditional Variational Auto-Encoders(CVAE)and Extreme Value Theory(EVT),and they are used for known and unknown traffic classification respectively.Different form other methods,contrastive learning is adopted in different CVAE stages,which significantly mitigates the effect of lack of labeled data;3.Aiming at the problem that existing methods are difficult to incrementally identify new malicious traffic in large-scale network systems,a new method based on contrastive incremental learning for fine-grained malicious traffic classification is proposed.It is based on Variational Auto-Encoder(VAE)and EVT.Specifically,the contrastive learning is integrated into the encoder of VAE,and the A-Softmax is used for known and few-shot malicious traffic classification;EVT and the decoder of VAE are used for unknown malicious traffic recognition;all classes could be recognized without a lot of old samples when learning new tasks by using VAE reconstruction and knowledge distillation methods.The proposed method meets the need for incremental learning in traffic classification;4.Aiming at the problem that that existing methods usually generate a lot of false positives when detecting unknown malicious user behaviors in large-scale network systems,a new method based on Generalized Zero-Shot Learning(GZSL)for analyzing unknown malicious user behaviours is proposed.It first uses Graph Convolutional Network(GCN)to reduce the effect of different user behaviour patterns before detecting.And then a hyper-spherical VAE method based on semantic information is used for unknown malicious user behavior identification,which improves the accuracy of malicious user behavior detection in dynamic environments;5.Aiming at the problem that that existing methods are difficult to detecting unknown multistep attacks in large-scale network systems,a new method based on alert and log correlation is proposed for multi-step attack forensics and traceability.It adopts the edge computing architecture,and first analyze security alerts and logs using the ontology based on Adversarial Tactics,Techniques,and Common Knowledge(ATT&CK).And then an alert reduction method based on Word Mover’s Distance(WMD),an alert correlation method based on attribute and communication structure and a multi-step attack reconstruction method based on Monte Carlo Tree Search(MCTS)are proposed for multi-step attack detection,which overcomes the difficulty of detecting unknown multistep attacks.The research results of this dissertation solve the unknown network threat detecting problem in large-scale network environments.It provides theoretical support,model guidance,and method guarantee for unknown network threat detection.It is helpful for security managers to timely implement network defense actions. |