Font Size: a A A

Sensitivity-aware High-dimensional Data Differential Privacy Protection Method

Posted on:2020-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:C F LuoFull Text:PDF
GTID:2428330596973756Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Digital technology makes it easy for various organizations to collect large amounts of personal information,such as medical records,web search history and more.For these data,many potential information and rules behind the data can be mined or extracted to provide accurate dynamic and reliable predictions for organizations and individuals.For example,people's historical medical records and genetic information can be collected to help hospital staff better diagnose and test the health of patients,and the various environmental monitoring data collected from smartphone can make urban planning more efficient and make people's lives more convenient.Shafei Godwasser,director of the Institute of Theoretical Computing at the University of California,Berkeley,is the youngest Turing Award winner.She said that people are particularly concerned about how to calculate data with protecting data privacy in the era of big data.However,current privacy issues remain a major obstacle to data analysis and collection.The packet-based privacy protection model which represented by the k-anonymous model achieves the purpose of privacy protection through generalization and suppression.However,these privacy models are vulnerable to attack from the attacker's background knowledge,and there is no quantitative analysis of the degree of privacy model protection.The differential privacy protection model solves this problem by injecting random noise into the statistical results.Differential privacy is rapidly gaining popularity because it provides strict privacy protection for attackers with arbitrary background information.The main advantage of the differential privacy protection model is that it provides privacy protection regardless of how much background knowledge the attacker has,and gives a quantitative analysis of the risk of privacy breaches.How to improve the utility of statistical data while satisfying differential privacy is an important issue in the current differential privacy protection model.However,Kifer first pointed out in 2011 that the differential privacy model with the associated data can not achieve a good privacy protection effect.Although some studies use the mutual information and other quantitative methods to quantify the correlation between attributes to distribute the privacy budget and reduce the noise.However,it destroys the correlation between attributes,thus losing the utility for data analysis.In addition,existing researches rarely consider the sensitivity between different attributes.The mainstream practice is to divide attributes into sensitive attributes and non-sensitive attributes,and then the degree of association between sensitive and non-sensitive attributes is calculated to perform noise parameter configuration,but artificially distinguishing between sensitive and non-sensitive attributes also leads to many errors.In order to solve the above problems,this paper takes relevant data as the research object.Combined with the differential privacy protection model and the rough set theory theory,this paper makes a detailed study on the privacy protection in the associated data.The main research work includes the following aspects:(1)Firstly,the current research status of the existing privacy protection model is briefly explained.We analyzed that an anonymous model based on k-anonymity is vulnerable to background-based attacks.Although the differential privacy protection model can resist background knowledge attacks,it does not apply to associated data.Because data associations pose an additional risk of privacy breaches.We also point out the significance of data relevance analysis in real life.(2)For the correlation between attributes,firstly,combining the concept of attribute dependence in rough set theory to measure the correlation between attributes,we propose a differential privacy protection method based on rough set theory.In order to distinguish the different degrees of association between different attributes to add different degrees of noise,this paper classifies the attributes according to the attribute dependence.Secondly,considering the difference in sensitivity between attributes,the concept of information entropy is combined to measure the sensitivity of attributes.Because information entropy is a measure of the amount of information based on background knowledge.The greater the entropy,the greater the uncertainty,the greater the amount of information,indicating the more information you need to determine.We have reason to believe that the greater the uncertainty,the less chance that an attacker will attack.For this reason,this paper proposes the concept of information entropy differential privacy.And through security analysis,we point out that the proposed information entropy differential privacy protection model in attributes take more than two types of data better than the data which the value of the attribute is only yes or no.But still better than the traditional average allocation of privacy budget.(3)According to our proposed differential privacy protection method based on rough set theory,the Laplace noise-adding mechanism is used to perturb the statistical results.According to the method we proposed,the overall requirements of the system are explained,and the overall design and implementation of the system are introduced.Then,according to the requirements analysis,the implementation steps of each sub-module of the algorithm are described in detail.(4)Experiments were carried out on three real data sets NLTCS,Adult and Kaggle game data sets.Combined with the metrics commonly used in differential privacy,the performance of the method is measured by the error under different parameter changes.The data utility of the method is verified by the average error of the query size and the total error of multiple random queries under different privacy budgets.The error results show that the proposed method can preserve the usefulness of the data.
Keywords/Search Tags:privacy protection, differential privacy, association data, rough set theory
PDF Full Text Request
Related items