| With the development of network technology,network security has been increasingly prominent.Advanced persistent threat(APT)in industrial information scenarios is challenging to detect due to its extended latency period,high concealment,low frequency,and slow speed.Due to the low frequency of APT attacks,the proportion of attack behavior is very low in all system behaviors during the attack process,making it difficult to distinguish normal behavior from attack behavior in the short term.However,the accumulation of data generated by abnormal behavior will reflect significant differences from normal behavior in the long term.As a result,extracting the long-term characteristics of the sequences is critical.However,a system generates a large amount of data that grows over time,and current detection methods are difficult to mine the features of those massive data in an efficient and accurate way,rendering traditional attack detection methods ineffective.Additionally,most of the current APT attack detection methods concentrate on the characteristics of attack behaviors while neglecting the potential correlations accumulated during persistent attacks,making them ineffective in detecting APT attacks.This paper proposes an APT attack detection method based on provenance graphs,which includes constructing system call relationships into provenance graphs,extracting long-term features while retaining rich contextual information,modeling normal system behavior using clustering analysis algorithms,and identifying attack activities without predefined attack features.The main work includes:(1)Provenance graph feature representation based on system call logs.Firstly,system call logs are extracted into provenance graphs,and combined with the CausalityPreserving Reduction(CPR)algorithm,the breadth-first directed random walk and Word2Vec embedding method are used to embed the provenance graph into a lowdimensional continuous vector space.The contrast experiment is conducted to verify the effectiveness and usability of the proposed embedding method in reducing data size while retaining as much network topology information as possible.(2)Long-term feature extraction of APT attacks based on provenance graph embeddings.Considering the long latency period and low frequency of APT attacks,this paper employs the multi-head attention algorithm on the basis of the provenance graph feature representation sequence in the first stage,to capture the hidden associative relationships among elements in the provenance graph embeddings for long-term feature extraction in the APT attack process.Finally,a feature vector containing information on the entire sequence is generated for subsequent clustering algorithms.A contrast experiment is conducted to verify the effectiveness of the proposed long-term feature extraction method in extracting global dependency information from the provenance graph sequence and significantly improving the training speed and attack detection performance of the model,where the training speed and accuracy are improved by 24%and 11%,respectively.(3)Unsupervised APT attack detection.To address the issue that training models with attack data may lead to poor generalization of the model to unknown attacks,this paper proposes an unsupervised one-class detection model based on normal system behavior using clustering algorithms.During the attack detection phase,the distance between the extracted long-term feature vectors of the provenance graph and the clustering center is used to identify attack activities.This paper conducted experiments on five widely used datasets to evaluate the APT attack detection architecture designed based on the above three steps.The experimental results demonstrate that the proposed method can effectively improve the overall accuracy of the model and achieve better results compared to the current state-of-the-art detection methods. |