| In recent years,various applications based on deep learning have developed vigorously including face recognition,automatic driving and intelligent voice assistant,which have been inseparable from people’s production and life.Training a high-performance deep learning model requires a lot of training data and expensive hardware resources,hence it belongs to the intellectual property of model trainers.The model extraction attack allows the adversary to train a substitute model with similar functions through the output of the target black-box model,which seriously infringes the legitimate rights and interests of the model holder.Therefore,the study of model extraction attack and its defense methods holds great practical and security value.Existing research shows that dataset inference and style feature inference can be used to resist model extraction attacks.Dataset inference determines theft by calculating the distance from the private sample to the decision boundary of the model,but the adversary can fine-tune the adversary model to slightly change the decision boundary to avoid verification.Style feature inference embeds style feature in the model,and determines theft by judging whether the adversary model contains this feature.However,this method requires the white-box permission of the adversary model,which is difficult.In view of the shortcomings of the existing work,the contributions and main research contents are as follows:(1)We propose a new system called MEW,which combines the Model Inversion attack and Elastic Weight Consolidation to evade the detection of DI.We first use the pre-trained adversary model to generate a data pool and select samples to approximate the Fisher Information Matrix of previous task.Then we use an adaptation of EWC to slightly fine-tune the adversary model which moves it decision boundary slightly to evade the detection of DI.(2)We propose a model extraction defense method based on external embedded features in black-box scenarios,which determines theft by detecting whether the adversary model has(partial)knowledge external embedding of the victim.Firstly,we convert some training images into gray-scale images as embedding and inject them to the training set.Then,we train a binary classifier to determine whether the model is stolen from the victim.(3)We extend the types of model extraction attacks,and divide them into dataaccessible attack,model-accessible attack and query-only attack.We test above two methods on three baseline datasets and different types of attacks.The experiments prove the effectiveness of these two methods. |