Font Size: a A A

Research On The Optimization Of Bayesian Differential Privacy Method For High-Dimensional Data

Posted on:2020-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y W TangFull Text:PDF
GTID:2428330596473763Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and computer technology,the category and scale of data have accumulated at an unprecedented rate,and the era of big data has arrived.Reasonable analysis of data is of great use value in practical application.For example,scientific research institutions conduct research and analysis on the original medical data of the hospital,and can infer whether the patient has a certain disease;the website analyzes the user's rating data,and can more accurately recommend the user's preferences.However,these data usually contain a large amount of personal information.Improper use of data can produce privacy threat.In particular,the high-dimensional data completely released,due to the high correlation between the high-dimensional data,privacy threat more prominent,therefore,need to have a reasonable and effective privacy protection method for high-dimensional data privacy protection.At present,the privacy protection of high-dimensional data mainly faces the following three challenges:?1?For many natural data sets,the size m of the data set domain is much larger than the number n of data sets.Therefore,the traditional k-anonymity privacy model is not suitable for high-dimensional data sets.?2?The hypothesis of differential privacy is that data is independent,and high-dimensional data with relevance cannot provide effective privacy protection.Because the degree of correlation between the data enhances the background knowledge of the adversary,it is more likely to lead to privacy leaks.?3?In addition,because high-dimensional data contains a large number of attributes and the sensitivity of the attributes is different,and the existing privacy processing methods mostly set uniform privacy parameters,not only need to inject a lot of noise,but also lead to a large number of The loss of information makes the published data less useful.Therefore,it is necessary to design a privacy protection method for the release of high-dimensional data,so that it can fully guarantee the privacy and safety of data and effectively guarantee the utility of data.In this paper,the privacy problem of high-dimensional data publishing is studied.The Bayesian network is constructed to measure of the correlation between attributes,and based on the correlation between adding different levels of noise,through improved Bayesian difference privacy model and algorithm,an optimization method based on Bayesian network model of differential privacy solve high-dimensional data privacy problem and ensure the effectiveness of the data.The specific contents are as follows:?1?This paper analyzes and compares the research status of the existing privacy protection methods for the release of high-dimensional data,and points out that the existing privacy protection methods have three major problems:First,due to the high dimension of high-dimensional data,the k-anonymous model will suppress most of the data and greatly reduce the effectiveness of the data.Second,the differential privacy model is mainly for independent data,without considering the correlation between the data,and cannot fully protect the privacy and security of data.Third,the existing noise adding method is to set uniform privacy parameters to conduct noise disturbance on data,without considering whether the property is sensitive or not.Privacy budget allocation is unreasonable,and some data are excessively lost and some data are not safe enough.?2?Aiming at the relationship between attributes in high-dimensional data,this paper puts forward a kind of using the mutual information between attribute to construct a Bayesian networks algorithm,make its can eventually get the only one that satisfy L1?-differential privacy of Bayesian networks,the algorithm implementation of high-dimensional data dimension reduction at the same time to fully reflect the characteristics of the original data set,to ensure the effectiveness of the data.?3?Based on the correlation calculation of Bayesian network above,a new privacy budget allocation method is designed for the problem of excessive information loss in high-dimensional data.The method combines the privacy leakage coefficient and the degree of privacy protection of sensitive attributes and non-sensitive attributes to allocate the privacy budget reasonably to enhance the security of the algorithm.?4?According to the proposed Bayesian differential privacy method for high-dimensional data,the algorithm is designed and implemented on three data sets?Adult,NLTCS and a medical data?,from relative error,sum of mutual information and SVM classification.Performance indicators to verify the effectiveness of the method.The results show that our method can effectively protect the privacy of high-dimensional data and ensure the utility of published data sets.
Keywords/Search Tags:High dimensional data, Differential privacy, Bayesian network, privacy protection, multi-dimensional correlation
PDF Full Text Request
Related items