Research On Privacy-preserving Classification Technology Based On Differential Privacy | Posted on:2023-12-15 | Degree:Master | Type:Thesis | Country:China | Candidate:L W Guo | Full Text:PDF | GTID:2556307061453784 | Subject:Computer Science and Technology | Abstract/Summary: | PDF Full Text Request | With the convenience of data sharing,data leakage incidents have occurred frequently in recent years,and society pay more and more attention to data security.China successively enacted Personal Information Protection Law and Data Security Law in 2021.In the application scenario of big data mining,the problems of data security are prominent,and privacy protection in data mining has become a research hotspot.Classification is an important function in data mining,which can predict events and states that have not happened yet based on historical data and support enterprise decision-making further.Classification has been widely applied in big data scenarios.To overcome the shortcoming that the classification accuracy is insufficient of the existing privacy-preserving decision tree construction methods and logical regression mining methods,the privacy-preserving classification technology based on differential privacy is improved,after which data security will be protected and the classification accuracy will be maintained at the same time.The main work of the thesis includes:(1)To solve the problem that the existing methods add differential noise to the query count values,resulting in the poor availability of the count values and the poor classification accuracy of constructed decision trees,the privacy-preserving decision tree construction method based on differential privacy DP-DTC is proposed.The classification gain matrix is designed to store the count values required to calculate the information gain and its perturbation method based on differential privacy is proposed to protect individual data privacy.The reconstruction method of the perturbed classification gain matrix is designed according to the consistency constraint to maintain the distribution of category labels.The reuse scheme of count values is designed to avoid redundant queries,which can increase the privacy budget of a single query.The adaptive privacy budget dividing method ADP-BD is designed to solve the problem of the count values in deeper levels having lower signal-to-noise-ratios.At last the classification accuracy of the constructed decision tree is enhanced.(2)Aiming at the problem that the privacy-preserving training data are not sufficiently accurate to support logistic regression modeling in the data sharing scenario,the data generation model LRDG based on Generative Adversarial Network is constructed,in which the generator network weights are constrained by the average distance between data groups to maintain the classification accuracy of the logistic regression model trained on the generated data.The LRDG generators perturbation method based on differential privacy is designed to protect individual data privacy.The differential privacy data releasing method oriented to logistic regression DP-LRDR is proposed based on LRDG at last to achieve differential-privacy-preserving logistic regression modeling.Theoretical analysis and experimental results show that the proposed method can maintain classification accuracy while protecting individual data privacy. | Keywords/Search Tags: | Privacy-preserving, Differential privacy, Classification, Decision Tree, Logistic Regression, Generative Adversarial Network | PDF Full Text Request | Related items |
| |
|