Font Size: a A A

Data Privacy Masking Of Text Sequence Dataset Based On Generative Adversarial Network

Posted on:2021-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306047986799Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence technology has developed vigorously,machine learning and deep learning have been widely used in computer vision,natural language processing,decision and recommendation,etc.,which not only facilitates people's lives but also promotes industrial upgrading and social progress.Data played an important role during this period.The more representative data,the more knowledge a machine learning model can learn.However,the issue of data privacy has not been given enough attention,and privacy leaks are frequent.With the enrichment of intelligent scenarios and the advancement of attack methods,privacy risks cannot be ignored.Traditional data pricacy masking methods not only require careful design of rules but also do not achieve a good balance between data usability and privacy,which is no longer sufficient to protect our privacy.Some new technologies of privacy preserving machine learning are mostly based on homomorphic encryption and secure multiparty computing systems,which have large computation and communication overheads.For deep learning models with complex structures,they have low efficiency and are difficult to implement in practical applications.Aiming at the privacy preserving needs of text sequence data applied to machine learning and deep learning,we proposes a new method for data privacy masking of text sequence datasets based on generative adversarial network(GAN)and differential privacy mechanism.Specifically,our work and contributions are as follows:(1)GAN can generate data close to the original data distribution,we use this feature and design a GAN-based labeled text sequence generation model.The generator is an LSTM network which is widely used for text generation.The discriminator is a CNN network that is more efficient in text classification.The generator and the discriminator play a game with each other and are optimized alternately to generate a new dataset.For the data can be better used for supervised learning,we have designed a method for generating labeled data,which can directly and effectively generate labeled data according to the category of training data.Through the deep learning and description of the data by the model,the manual operation is simplified,which overcomes the drawbacks that traditional methods need to manually design rules that define and find privacy attributes,and such rules are difficult to achieve for general tasks;The usability of the resulting data is close to the original data;and because the generated data is a rewrite or obfuscation of the original data,there is no one-to-one correspondence between the generated data and the original data,privacy is improved.(2)Further,we introduce differential privacy to the model,and train the differential privacy discriminator by clipping gradient and adding noise,which limit the impact of each sample on the final model and introduce randomness to the model.Based on a thorough investigation of the existing privacy attacks on machine learning models,we designed and implemented membership inference attack on our model,and using this as the privacy testing method for our model.The test results show that the privacy of the model is higher after the introduction of differential privacy and the success rate of membership inference attack is reduced.That is,it is difficult for the attacker to complete the reconstruction of the original data from the membership inference attack on the the model;on the other hand,the usability of the generated data is higher.Under some downstream models,especially neural network models such as RNN and CNN,the availability of the generated data is even higher than that of the original data.
Keywords/Search Tags:Privacy Preserving, Machine Learning, Generative Adversarial Network, Differential Privacy, Data Privacy Masking
PDF Full Text Request
Related items