Font Size: a A A

Differential Privacy Data Releasing Based On Bayesian Network

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:X J QiFull Text:PDF
GTID:2518306509454574Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the improvement of people’s awareness of privacy protection,differential privacy,as a privacy protection method that can be strictly mathematically proven,has gradually become a research hotspot.Among them,the method of publishing a synthetic dataset that meets differential privacy has attracted attention because it can meet the different analysis needs of various data analysts.However,as the dimensionality of data increases,the availability of synthetic data sets published through differential privacy is gradually reduced because of the high dimensionality.This kind of data is highly sensitive to noise,resulting in low data availability after noise is added.The current idea to solve the problem of publishing high-dimensional synthetic data is to reduce the sensitivity of the data to noise by reducing the dimensionality of the dataset’s attributes,thereby improving the usability of the data.However,there is a high degree of correlation between the attributes of high-dimensional data,and the correlation between these attributes reflects the value and significance of high-dimensional data sets.It usually takes a lot of time and resources to preserve the correlation between attributes in the process of publishing data.Therefore,under the condition of satisfying differential privacy,minimizing the introduction of noise,preserving more attribute relevance,and reducing calculation time have become important research contents for publishing differential privacy synthetic datasets.To solve the above problems,this paper proposes the following two algorithms:(1)A synthetic data publishing algorithm APriv Bayes(Alien Priv Bayes)based on Bayesian network is proposed.First,the algorithm designs a multi-node network structure to reduce the amount of noise introduced.This method can effectively reduce the number of subnets and increase the privacy budget of each subnet.At the same time,the first node selection mechanism is designed for the multi-head node network structure,and the node with a higher correlation with other nodes is selected as the first node.Besides,the intermediate calculation data of the first node selection process is used to reduce the parent node candidate space of each node through the range filtering technology,achieving the purpose of reducing the amount of calculation.Finally,the APriv Bayes algorithm is verified by experiments on real data sets.The experimental results show that: APriv Bayes algorithm can improve the usability of synthetic data sets.(2)A junction tree algorithm JTFAPB(Junction Tree of Fast Alien Priv Bayes)based on Bayesian network is proposed.The algorithm constructs a joint tree based on Bayesian network,and generates a synthetic dataset through the joint tree.First,the algorithm proposes a fast Bayesian network construction algorithm FAPriv Bayes(Fast Alien Priv Bayes),which effectively reduces the time to construct Bayesian network by controlling the order of nodes joining the Bayesian network.Then a junction tree is constructed based on the Bayesian network,and the privacy budget is allocated reasonably according to the size of the group.Finally,the inverse variance weighting method is used to deal with the inconsistency of the edge part of the group and the split point.Finally,the FAPriv Bayes and JTFAPB algorithms are experimentally verified on the real datasets.The experimental results show that the FAPriv Bayes algorithm can effectively reduce the calculation time while ensuring data availability,and the JTFAPB algorithm can improve the availability of the synthetic dataset.
Keywords/Search Tags:Differential privacy, Bayesian network, data-releasing, junction tree
PDF Full Text Request
Related items