Font Size: a A A

Research And Implementation On Data Mining Methods Based On Privacy Preserving

Posted on:2021-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y L YangFull Text:PDF
GTID:2518306308470214Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The widespread application of big data and artificial intelligence technologies has enabled the great value behind data to be tapped,but it has also brought about a tricky privacy leak.Under the premise of ensuring data security,how to realize open sharing and efficient mining of big data has become an increasingly important research area.In order to cope with the risk of privacy leakage in data mining methods,this paper designs and implements two privacy preserving-data mining models for unstructured data by in-depth research on big data privacy protection technology,which can effectively achieve a balance between data security and availability.The main innovative achievements of the paper are as follows:(1)Aiming at the problem of privacy leakage in the deep learning model and the opacity of privacy protection,this paper combines differential privacy with generative models and innovatively proposes an adaptive differential privacy generative adversarial network model(Adp-GAN).Adp-GAN rationally allocates Laplacian noise to the input features of the affine transformation layer of the neural network as a discriminator and the polynomial approximate coefficients of the loss function of the output layer through the adaptive differential privacy implementation mechanism.While implementing differential privacy protection,Adp-GAN effectively reduces the consumption of privacy budget during training and improves the utility of the model.Experiments on the standard datasets MNIST and CelebA verify that Adp-GAN can generate higher quality data.In addition,members' reasoning attack experiments prove that Adp-GAN has better ability to resist attacks.(2)To address the deficiencies of traditional data masking techniques,this paper focuses on the identification of unstructured sensitive data,and constructs an adaptive data masking-named entity recognition model(Adm-NER).Based on the Bi-LSTM-CRF model,Adm-NER applies adversarial transfer learning to the field of data desensitization,which can effectively identify sensitive data in the lack of sample fields,and then combined with self-attention mechanism to assist in the positioning of word boundaries to achieve Higher recognition accuracy.The results of five comparative experiments show that Adm-NER has significantly improved the accuracy of identifying sensitive data.In addition,the transfer learning experiment from the news field to the medical field proves that Adm-NER can adaptively learn common features by using large-scale labeled samples in the news field to achieve accurate positioning and recognition of sensitive data in the medical field,which is conducive to subsequent data desensitization.Adm-NER provides a new idea for the intelligent design of big data masking systems.
Keywords/Search Tags:Differential privacy, Generative adversarial network, Data masking, Named entity recognition
PDF Full Text Request
Related items