Font Size: a A A

Differentially Private High-Dimensional Data Publication Via Probabilistic Graphical Model

Posted on:2020-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:F Q WeiFull Text:PDF
GTID:2428330590995623Subject:Information security
Abstract/Summary:PDF Full Text Request
With the emergency of big data era,a large number of user data is generated and accumulated,and many popular applications today are personalization and intelligent services based on user data.Therefore,privacy protection for high-dimensional data has become a research hotspot.Differential privacy is widely recognized in the industry as a practical standard of privacy protection for its ability to withstand background-independent knowledge attacks and quantify privacy protection.Although differential privacy can effective handle simple relational data,there are still many challenges for releasing high-dimensional data under differential privacy.Therefor,the current research focuses on reducing the data dimension and simplifying the relationship between attributes to make the published data have objective accuracy and availability.In this paper,the problems in the differential privacy publishing of high-dimensional data are studied.Starting from the idea of abstracting problems in specific applications into the problem of calculating the probabilistic distribution of certain variables in the probabilistic model,a method for differential privacy publishing of high-dimensional data based on the Probabilistic Graphical Model is proposed.Facing some challenges,such as the complex attribute relationships,the high computational complexity and data sparsity,Markov Network Model is selected to publish high-dimensional data with differential privacy guarantee.Specifically,using the Markov model to represent the mutual relationships between attributes.And then take advantage of approximate inference to calculate the joint distribution of high-dimensional data under differential privacy.Since there are two kinds of the relationship between the variables in real life,and use Markov Network Model only can solve the problem of undirected relations.Therefore,the Chain Graph Model is proposed to refine the complicated relationship between different attributes.And then use different processing technology to deal with different relationship,thus further improve the accuracy of the data and extend the using range of the algorithm.A series of experiments on real data sets demonstrate that differentially private high-dimensional data publishing based on Markov Network and Chain Graph proposed in this paper is better than other methods,compared to be well preserved the effective information in high-dimensional data,and makes the published high-dimensional synthetic datasets more efficient under the guarantee of differential privacy.
Keywords/Search Tags:High-dimensional data, Differential privacy, Data publication, Probabilistic Graphical Model, Markov Network, Chain Graph
PDF Full Text Request
Related items