| With the advent of the era of big data,machine learning technology is gradually penetrating all walks of life,such as face recognition,automatic driving,malware detection,and intelligent medical diagnosis.However,the high-performance machine learning model(deep neural network)has inexplicable characteristics,which leads to people’s inability to understand the root cause of its decision-making,thus making deep neural networks difficult to be applied to high-risk realistic tasks and exacerbating the trust crisis between the model and people.To solve this problem,machine learning interpretability comes into being.The development of machine learning interpretability ensures the credibility,transparency,and fairness of model decision-making,which plays a crucial role in building the trust bridge between artificial intelligence and people.Unfortunately,while machine learning interpretability increases the transparency of the model,the additional explanatory information poses a threat to the model itself.At present,the attack research on the interpretability of machine learning is in a preliminary stage of exploration,and there is no established research system.How to balance the sensitive relationship between model interpretation methods and model security is an urgent problem to be solved.This paper focuses on the research of counter-attack and defense based on the interpretability model of machine learning and carries out the following two research works.The specific research contents and contributions are as follows.(1)A small adversarial sample attack method based on a machine learning interpretability model is proposed.The small adversarial sample method aims to extract features that play an important role in model decision-making from model interpretation information,and then add disturbance based on the extracted features to generate adversarial samples with a small disturbance range that can not be detected by human eyes.The experimental tests covered two data sets and tested the results of the attacks in both white-and black-box cases.The experimental results show that the small adversarial sample method based on the machine learning interpretability model can greatly reduce the disturbance range and generate adversarial samples that are difficult for human eyes to detect under the condition of guaranteeing the attack success rate and distortion rate.(2)An interpretable method based on user query data is proposed.This explicable method aims to explain the deep learning model from the perspective of data and find the data most similar to the predicted results of input data as the model interpretation information.In addition,this method reduces the risk of privacy disclosure and defends the member inference attack while giving out the model to interpret the information.Based on the label matching degree between model interpretation and user data and the similarity degree of model prediction confidence as evaluation criteria,the validity and correctness of this interpretable method were evaluated.The experimental test covers a variety of deep neural network models and data sets,and the experimental results verify the universality of the interpretable method based on user query data. |