Font Size: a A A

Research On Differential Privacy Protection Based On Classified Data

Posted on:2022-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:M S LiFull Text:PDF
GTID:2518306344951219Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the rapid development of internet technology,a large amount of data is generated every day.These data are very valuable and have very important significance for the development of society and the progress of science and technology.However,people's privacy issues have also encountered unprecedented challenges.In recent years,privacy breaches have occurred frequently,and a wide range of data types have been leaked,including categorical data such as gender,education,and religious beliefs.The attacker can steal the user's private information by directly obtaining the original categorical data or analyzing the original categorical data.Therefore,it is important and necessary to protect the privacy of categorical data.Compared with traditional privacy protection technology,differential privacy(DP)technology based on strict mathematics can resist background knowledge attacks.Moreover,the risk of privacy leakage can be quantitatively analyzed and good data availability can be retained through DP technology.Therefore,how to use DP technology to protect categorical data is an important research topic.In the data processing process,the two stages that are prone to privacy leakage are data collection and analysis.For these two stages,this thesis considers the privacy protection of categorical data.In the data collection stage,considering the leakage of users' original categorical data,a categorical data collection mechanism based on local differential privacy is proposed.Users provide the categorical data after privacy processing to the data collector to ensure that the users' privacy is not leaked during the collection and subsequent processing of the categorical data.In the data analysis stage,if the original data has been directly collected,the analysis of the original data may leak out users' additional privacy information.For example,k-modes,the most common clustering algorithm for categorical data,may disclose the privacy information of sample points or data sets.Therefore,to ensure that the users' privacy is not leaked during the k-modes clustering analysis,the k-modes clustering algorithm is improved based on ?-centralized differential privacy in this thesis.The principal research works of the thesis include:(1)An easy-to-implement categorical data collection mechanism based on local differential privacy is proposed.Considering that the implementation of some local differential privacy mechanisms is relatively complex,the mechanism mainly realizes the protection of categorical data through five simple steps:encoding,noise adding,integer approximation,modulo,and decoding.According to the situation of one-dimensional and multi-dimensional categorical data,this thesis respectively gives the corresponding collection algorithm and transition probability,and analyzes the utility and privacy of the algorithm.Here,the utility is defined by the accuracy of the output data relative to the input data,and the privacy level is measured by the "true"privacy level of the local differential privacy mechanism,that is,the minimum privacy level.In addition,under different noise distributions and data distributions,this thesis has conducted experiments to verify and explore multiple characteristics of the mechanism.Experimental results show that the proposed categorical data collection mechanism can effectively protect the privacy of categorical data and preserve the availability of categorical data.(2)A k-modes clustering algorithm for categorical data that satisfies DP is proposed.This thesis analyzes the problem of privacy leakage in the process of k-modes clustering in detail,and adopts the Laplace mechanism and exponential mechanism,which are the most commonly used in ?-centralized differential privacy,to interfere with the selection process of clustering centers,so as to achieve the purpose of privacy protection.According to different noise distributions,this thesis proposes three k-modes clustering algorithms for categorical data that satisfy DP.The privacy analysis of the above algorithms and the comparison of clustering results on different data sets are carried out.The experimental results show that the k-modes clustering algorithm for categorical data proposed in this thesis can protect the privacy of categorical data on the basis of ensuring the clustering effect.
Keywords/Search Tags:categorical data, data collection, clustering algorithm, centralized differential privacy, local differential privacy
PDF Full Text Request
Related items