Machine learning is one of the most popular technologies in computer science.It has been widely used in image processing,natural language processing,and network security.Although machine learning algorithms have achieved good results in many practical applications,research in recent years has shown that it faces various security threats from attackers.Among these security threats,a poisoning attack is an induced attack that can seriously damage the validity,integrity,and usability of the machine learning model by modifying the samples in the original training data set or injecting poison into the original training set during the training phase.The sample,which induced the training data to drift,caused a significant drop in the performance of the target machine learning model.This paper mainly proposes two methods for constructing poisoned samples based on common machine learning algorithms,and further proposes a poisoning strategy for black box machine learning models to study their security threats and performance impacts on machine learning algorithms..In addition,aiming at the characteristics of poisoning samples,this paper proposes a sample legality evaluation method to improve the robustness of machine learning algorithms to poisoning attacks.The main contributions of this paper are as follows:(1)Two data-drifting-based boundary mode data poisoning attack methods are proposed.Data drift occurs when the data distribution in the training data deviates from the actual data distribution.The attacker will deliberately inject the poisoned data into the original training data set,causing data drift in the training data set.This paper first proposes a definition and detection method of boundary mode data which can cause data drifting.Based on this,two methods for constructing boundary mode data are proposed,namely,central vector extrapolation method and batch edge mode data extrapolation method.An effective poisoning attack on the training data set is achieved.In addition,experiments in two practical applications of network data detection data sets and handwritten character data sets show that these two poisoning attack methods can seriously damage the performance of six commonly used machine learning algorithms.(2)A poisoning attack strategy for the black box machine learning model is proposed.In practical applications,the specific information of the target machine learning system is not easy to obtain,so it is a black box machine learning model for the attacker.This paper first proposes an improved SMOTE algorithm to amplify part of the training data,and combines the DNN algorithm to train the alternative model of the target machine learning model,thus achieving the theft of the target model.Based on the stolen model,the two poisoning sample construction methods proposed above were used to design and implement different poisoning attack strategies.In addition,through the experiments of different poisoning strategies on the network intrusion detection dataset,the performance of different poisoning strategies was analyzed and compared.(3)A sample legitimacy evaluation algorithm based on multi-spectral clustering aggregation is proposed.Current defenses against poisoning attacks focus on data cleansing and improved algorithm robustness,in the absence of an assessment of the legitimacy of the sample.This paper analyzes the characteristics of existing poisoning samples,combines spectral clustering and integrated learning to propose a method to score the legitimacy of samples,and achieve an effective evaluation of the legality of samples.The effectiveness of the evaluation method was verified by experiments on the intrusion detection data set.The experimental results show that the proposed poisoning attack method can effectively destroy its performance for the commonly used machine learning algorithms.Moreover,the construction algorithm of the poisoned sample is simple to implement,and the poisoned sample can be constructed quickly and effectively.On this basis,the proposed poisoning strategy for the black box machine learning model can achieve effective poisoning attacks on the target system under the weaker adversary model,reducing the conditions for attack implementation.Finally,in the aspect of defense technology for poisoned samples,the method of legality evaluation for poisoning samples proposed in this paper can provide a reasonable reference for the use of training samples by machine learning algorithms and improve the robustness of the algorithm. |