Font Size: a A A

Research On High-dimensional Relational Data Publishing Method Satisfying Differential Privac

Posted on:2024-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:G Y ZhangFull Text:PDF
GTID:2568307130958479Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The release of high-dimensional relational data satisfying differential privacy is a major research topic.The usual method involves first reducing the dimensionality of high-dimensional relational data and then processing it using differential privacy techniques.However,existing research methods are still unable to effectively process high-dimensional relational data for the following three reasons.Firstly,the relationship between high-dimensional data attributes is complex and cannot be directly represented,and direct noise addition,using differential privacy will result in the original data being covered by noise,rendering the data unusable.Secondly,the model for constructing correlations between attributes is inaccurate,leading to a low utility of the synthesized datasets.Finally,the allocation of privacy budget in existing research schemes ignores the degree of correlation between attributes,introducing unreasonable noise and reducing the utility of the synthesized datasets.To address the aforementioned three issues,the main contributions of this paper include the following:(1)To address the issue of complex relationships between high-dimensional data attributes,and the inability to directly represent the relationship between attributes,this paper conducts in-depth analysis of high-dimensional data,selects Markov networks and Bayesian networks in probability graph models to represent the interrelationships between attributes,and reduces the dimensionality of high-dimensional relational data.(2)To address the low utility of synthesized datasets caused by incomplete representation of correlation between attributes and insufficient capture of model information in existing research,this paper offers a differentially private high-dimensional data publication algorithm based on Markov network,DPMN.This algorithm introduces rough set theory to model the correlation between attributes,and constructs an attribute graph by combining graph triangulation operations and threshold mechanisms,obtaining the maximal clique set.Then,using θ-useful,the algorithm reduces the candidate space of the marginals in the maximal clique set.Based on appropriate low-dimensional marginals,the algorithm constructs a Markov network and represents the distribution of Markov network nodes using exponential family distribution.By calculating the joint distribution based on the marginal distribution of the clique,high-dimensional data is synthesized for publication.Experimental results show that the DPMN algorithm surpasses similar algorithms in terms of the usability of synthesized datasets.(3)To address the problem of low availability of synthetic datasets caused by factors such as random selection of initial nodes in Bayesian networks and unreasonable allocation of privacy budget in existing research,this paper proposes a differentially private high-dimensional data publication algorithm based on Markov blanket,DPMB.This algorithm is based on constructing a Bayesian network using maximum information entropy and calculating the Markov blanket using the Bayesian network to achieve attribute clustering.The invariant post randomization algorithm is then applied on the low-dimensional attribute clusters,and the privacy budget is reasonably allocated according to the importance of the attribute clusters to perform data perturbation,resulting in the synthesis of high-dimensional data for publishing.Experimental findings indicate that the algorithm improves the utility of the synthesized datasets.
Keywords/Search Tags:Differential privacy, High-dimensional relational data, Markov network, Bayesian network
PDF Full Text Request
Related items