| While massive big data information has brought various conveniences to people,the network security environment and attack methods have become more and more complex.This also greatly reduces the accuracy and effectiveness of traditional intrusion detection methods.Therefore,how to identify intrusion behaviors in large data streams has become a major challenge for current intrusion detection systems.Aiming at the problems that the current intrusion detection for big data environment is difficult to complete high-dimensional learning and accurate detection of user behavior.An efficient and accurate intrusion detection method based on provenance graph is proposed.Its purpose is to use different stages of intrusion detection for flexible processing according to the size of the provenance graph and the degree of infection,and to meet the needs of efficiency and accuracy at the same time.The intrusion detection can be divided into two stages: rapid judgment and accurate judgment.First,we calculate the importance value of each node by defining the node importance,and then in the fast judgment stage,we select some nodes with high importance and their neighboring nodes,and convert the provenance graph into feature vectors through mapping rules,so as to quickly extract the main features of the provenance map can effectively reduce the size of the data set.If the detection result is significantly larger than a certain high threshold or lower than a certain low threshold,the result can be directly judged,which greatly improves the detection efficiency.otherwise,further accurate judgments are made: by increasing the number of selected nodes,increasing the scale of perception,comprehensively mining provenance graph information,and then making judgments,the accuracy can be effectively improved.Experiments on 11 different provenance datasets show that compared with CNN(onestage),Pagoda and Unicorn,the intrusion detection method proposed improves the accuracy on average by 4.15%,11.88% and 30.12%;the precision rate increased on average by 3.77%,10.29% and 52.24%;the recall rate is increased on average by 6.25%,37.25% and 40.59%;the F-score is increased on average by 5.62%,31.97%,49.48%.The intrusion detection time and storage space are within a reasonable range. |