With the widespread deployment of high-performance machine learning models in various fields,their privacy leakage has also received widespread public attention.Targeting specified machine learning models,model inversion attacks aim to perform inverse reconstruction of input samples for the labels of interest,posing a serious threat to personal privacy security.In recent years,a large number of studies have discussed how to perform model inversion attacks based on methods such as input optimization,generative networks,or gradient leakage.However,due to the complexity of deep neural network optimization and the large dimensional space of image samples to be restored,the existing attacks have the limitations of not being able to recover deep neural networks and requiring white-box access to the target model,which greatly limit the performance and applicability scenarios of model inversion attacks.In this paper,we aim to investigate model inversion attacks in a pure black-box scenario with confidence score output only,by deeply mining the distribution information obtained from the adversarial query and designing two efficient attack schemes based on zeroth-order optimization strategies for the scenarios with no data or with alternative datasets,respectively.The theoretical analysis and experimental evaluation results of this paper show that the above attack schemes can obtain better performance on multiple datasets and models in this blackbox scenario where only confidence is provided.The main research contents and contributions of this paper are summarized as follows:(1)An input optimization-based model inversion attack scheme(SODA)is proposed for the black box scenario where only confidence scores are output,under the restriction that the distribution of alternative datasets is not available.The scheme models the model inversion optimization process in an energy minimization framework by first sampling the initial image with a Gaussian distribution;secondly,using natural image prior,symmetry and random jitter constraints,the zero-order gradient estimate of the input image is obtained by multiple iterative queries on the target network;and then the target label inversion image is optimized in the search space with constraints.After theoretical analysis and experimental verification,this SODA attack scheme can effectively inverse perform the original user’s private privacy features and category instance images.(2)A generative adversarial network-based model inversion attack scheme(SA-GAN)is proposed for the black-box scenario where only confidence score output is available and alternative datasets are available.The scheme first achieves decoupled dimensionality reduction of the alternative dataset space by training a contrast learning model;then the decoupled space is used as the input space of the GAN generator,an additional similarity discriminator is added,and the SA-GAN is trained under the triple constraints of image reconstruction error,discriminator truth loss and vector similarity loss;finally the trained generator network is utilized in the decoupled dimensionality reduction input space The zero-order gradient optimization is performed to effectively invert the target image.The experimental analysis shows that the SA-GAN attack scheme can achieve the model inversion attack under the low query budget condition,and then obtain the effective category instance inversion image while significantly enhancing the stealthiness of the attack. |