Font Size: a A A

Research On High Dimensional Privacy Data Publishing Method Based On Probability Graph Model

Posted on:2024-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ChengFull Text:PDF
GTID:2568307130472774Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data publishing is the main way to achieve data information sharing.During the process of data publishing,there often exist high dimensional data with high attribute dimensions.Publishing these high dimensional data can bring enormous data research value and achieve a wide range of data mining tasks,but publishing high dimensional data also increases the risk of privacy information leakage in the data.The most advanced privacy protection method to solve this problem is differential privacy,which provides strong privacy protection for publishing data and does not rely on the background knowledge possessed by the attacker.However,direct use of differential privacy technology cannot effectively handle the publication of high dimensional data,especially when the input dataset contains a large number of attributes.Direct use of differential privacy protection technology requires injecting a large amount of noise into the dataset,which makes the availability of published data very low.This dissertation proposes corresponding solutions based on existing research to address the issue of privacy disclosure during high dimensional data publishing,and the poor availability of high dimensional data sets to be published using differential privacy protection directly.The main research contents are as follows:(1)A CJT algorithm for privacy publishing of high dimensional data is proposed.This algorithm constructs a dependency graph between attributes through mutual information,divides the attribute set of the original high dimensional data through the junction tree algorithm,and finally adds a differential privacy mechanism and synthesizes a new high dimensional data set for publishing.Theoretical analysis proves that CJT algorithm achieves centralized differential privacy protection.Comparative experiments with existing methods have proven that the CJT algorithm has higher accuracy.(2)To solve the problem of untrustworthy central servers,a high dimensional data publishing algorithm JT-LDP based on localized differential privacy is proposed by combining the junction tree algorithm with localized differential privacy.Comparative experiments with existing methods have proven that the JT-LDP algorithm synthesized private high-dimensional data has higher availability.(3)Combining differential privacy with Bayesian networks,a PBN algorithm is proposed.This algorithm first obtains the maximum degree value of the Bayesian network through a dependency graph,and then determines the head node through information entropy to synthesize a more accurate Bayesian network.Then,a Bayesian network is used to reduce the dimension of high dimensional data,and finally,a differential privacy mechanism is added to synthesize a new high dimensional data set for publishing.Experiments have shown that the PBN algorithm is better and more efficient than existing algorithms for publishing data.
Keywords/Search Tags:Differential privacy, High dimensional data, Probability graph model, Bayesian network, The junction tree
PDF Full Text Request
Related items