Font Size: a A A

Research On Multidimensional Correlation Hierarchical Differential Privacy Method For High-dimensional Data Publishing

Posted on:2019-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhaoFull Text:PDF
GTID:2428330566976142Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet and information technology,mobile devices have become increasingly popular.In reality,a large amount of high-dimensional data is stored in distributed multi-party.These data become a very important resource in the current Internet.When we gather these high-dimensional data,its application value is huge.For example,medical institutions can provide patients with more reasonable treatment methods through aggregated data(patient enrollment methods,income,etc.)to a certain degree to reduce the economic burden on patients.However,the original data owned by each party carries a large amount of personal sensitive information(eg income status,prevalence,etc.),if it is publishing directly,it will reveal personal privacy information of the user and bring unpredictable threat to the user and cause irreparable damage to the user.Therefore,privacy protection in distributed setting is an important issue.Research on this issue has become one of the hot topics in information security and data analysis.A number of privacy protection models or methods have been proposed for high-dimensional data publishing in distributed setting.These methods are mainly to try to reduce the dimension of data,and then to anonymity the data after dimensionality reduction,such as generalization or noise addition,but the difficulty lies in how to balance the contradiction between data privacy and utility.This is also a common problem faced by the existing technology.How to achieve the best compromise between privacy protection and data utility? The point is also one of the focuses research data release work in the future.The utility of data is an important indicator to measure whether data analysis work can be carried out.Therefore,in order to better protect the utility of data,we adopted the privacy protection methods must consider the purpose of data analysis and find suitable privacy protection methods in data publishing,so as to achieve the optimal balance between privacy protection and data utility.The existing traditional privacy protection technologies mainly have two mainstream models.One is the k-anonymity model and its extension model(such as l-diversity model and t-closeness model).The main idea is to define the attribute of the publishing data that can be connected to the publicly released data as a quasi-identifier,and we generalize attribute value of the quasi-identifier.The number of repetitions of each tuple at least k(k ? 2)in a tuple multi-set.When the publishing data is connected to other published data through attributes in the quasi-identifier,the entity information of each tuple in the result table is indistinguishable from other k-1 tuples whereby achieve the purpose of protecting the privacy information of the entity.However,this method model does not provide satisfactory protection for high-dimensional data publishing problems.Mainly because high-dimensional data is higher than the general data dimension,using the k-anonymous model will cause excessive data loss.The other is the differential privacy model.The biggest difference from the k-anonymous model is that this method assumes that the attacker has all background knowledge except the attack target.It has a strict formal definition and adds appropriate noise for the query or analysis result.Noise can achieve the effect of protecting privacy,but the amount of noise added directly affects the usefulness of data.At the same time,there is a problem of the allocation of privacy budget in differential privacy.In the existing differential privacy methods,the privacy budget is mostly average allocate.The result is that some participants are overprotected,and some participants are not protected enough to increase the risk of privacy leakage.In this paper,aiming at these problems,in the high-dimensional data release under the multiple privacy requirements,we proposes a Multidimensional Correlation Hierarchy Differential Privacy(MuCH-DP)method.The main research work is as follows:Firstly,this article points out the background and significance of the study.Then based on the current research status of privacy protection,it analyzes and points out the existing problems.Among them,for the problems of high-dimensional data publishing under the requirement of multiple privacy,we point the technical limitations of the current existing models or methods.Secondly,we analyzing the privacy issues for the differential privacy protection models in the distributed setting,The main reason is that the high dimensionality of the data makes the noise increase so much that the data utility is greatly reduced.This paper proposes a privacy protection scheme for multi-privacy requirements,which combines the different sensitivity of multi-dimensional attributes and the correlation between data,and analyzes the sensitivity degree of attributes and the correlation between data attributes.We designed a Multidimensional Associated Hierarchical Differential Privacy(Mu CH-DP)method that only adds noise to sensitive attributes and non-sensitive attributes that are highly correlated with sensitive attributes,For the remaining non-sensitive attributes,it is considered that it will not cause privacy leakage and can be directly published in this method.Compared with the traditional differential privacy method,it is more reasonable to add noise,and it can effectively control the total amount of added noise and achieve better data utility.Then,in the above scheme,we designed a personalized privacy budget allocation scheme based on the correlation between attributes.The traditional average allocation privacy budget does not take into account the different sensitivity of different attributes,which will inevitably lead to weaker protection for some local data sets,and be too strong for some local data sets,and does not meet the privacy requirements of each local database.Therefore,we proposes a personalized privacy budget allocation strategy,which make the allocation of privacy budgets more reasonable.The strategy will be smaller change to the data and provide more data utility.In this paper,we use the Laplace mechanism to add noise to the data,and give corresponding proofs to satisfy ?-difference privacy.Finally,this paper gives a detailed system design scheme,and compares experiments to verify the feasibility of this method.At the same time,it shows that the method has less change of data and enhances the utility in comparison with the previously proposed method for high-dimensional data publishing.
Keywords/Search Tags:Multiple privacy requirements, High-dimensional data, Multidimensional correlation, Hierarchical, Differential privacy
PDF Full Text Request
Related items