Font Size: a A A

Research On Local Differential Privacy Method For High-dimensional Data Based On Improved Bayesian Network

Posted on:2022-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:T GaoFull Text:PDF
GTID:2518306563466534Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Privacy protection in data release and the prevention of user privacy leakage have attracted much attention in the research of cyberspace security.The latest solution is to apply the differential privacy mechanism.As a strict mathematical framework for data privacy evaluation and protection,differential privacy has higher performance and a solid theoretical foundation.However,the existing technology of differential privacy cannot prevent privacy leakage for the release of high-dimensional data.As high-dimensional data are a large amount of highly-relevant data,the practicability of the relevant data is relatively poor when the differential privacy mechanism is used for data release.In particular,if the input data set features multiple attributes,a large amount of noise needs to be injected in the existing method;this makes the released data almost useless due to excessive distortion,which drastically impairs the accuracy and usability of the released data.The current high-dimensional data release methods offer a channel for highdimensional data release,but most of them fail to balance the correlations among highdimensional data or solve privacy threats of untrusted third parties.The dilemma leads to poor practicability of the released synthetic data.To tackle the above problems,the author conducted the research as follows:(1)For correlations among high-dimensional attribute sets,the author proposed to improve the local differential private data release algorithm via Bayesian Network(Normalized information entropy Priv Bayes,NE-Priv Bayes)using high-dimensional data,thus constructing a Bayesian network with higher quality.In this method adopting the maximum expectation algorithm,the joint probability distribution of highdimensional data sets was calculated,and a Bayesian network based on normalized information entropy and mutual information was built.On that basis,the obtained Bayesian network could effectively restore most of the relevance of the original attribute sets.(2)To tackle the problems that the existing Bayesian network algorithm does not factor into the degree value k and the initial node cannot be modified once selected,an improved DABC-based Bayes Network Searching algorithm(IDBNS)was proposed.With limited complexity,a Bayesian network structure with higher quality could be searched in the algorithm.In this method,the discrete artificial bee colony algorithm was used to construct a Bayesian network and learn its structure,which aims to find a globally optimal food source from the search space according to the BIC scoring standard.Experiments were conducted on the Adult dataset for comparison with the Priv Bayes method.According to the simulation experiment results,the NE-Priv Bayes method has advantages in data practicability,and the Bayesian network structure obtained by the IDBNS algorithm achieved better convergence.
Keywords/Search Tags:Local Differential Privacy, Improved Bayesian Networks, Artificial Bee Colony Algorithm, High-dimensional Data
PDF Full Text Request
Related items