Font Size: a A A

Differential Privacy Protection Data Release Via Bayesian Network

Posted on:2020-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y S DongFull Text:PDF
GTID:2428330575471914Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the deepening and development of mobile internet technologies,and applications,many companies and organizations(such as search engine companies,e-commerce companies or Internet service providers)provide convenient services while collecting a large amount of user behavior data for publishing,statistics,and Analysis and mining,etc.However,user-sensitive information often exists in these data.Therefore,before releasing and statistics on these data,it is necessary to protect the privacy of the data in order to prevent the leakage of user privacy data.The Differential Privacy Model is the most widely used privacy-protected data publishing technology today.It is used in many privacy protection scenarios with its outstanding performance,mainly by adding noise to the original data to achieve privacy protection.However,in the face of high-dimensional data,existing privacy protection algorithms usually inject excessive noise,making the published data excessively distorted and having poor accuracy and usability.Therefore,based on the satisfaction of differential privacy constraints,how Improving the effectiveness and usefulness of published data is a major difficulty.Therefore,the content of this dissertation is based on the issue of high-dimensional datasets under differential privacy protection.The research goal is to realize the validity and availability of high-dimensional dataset data distribution under the premise of satisfying differential privacy protection.The main contributions of the dissertation include the following two aspects:(1)Studying the Bayesian network structure,aiming at the shortcomings of the existing Bayesian network model,a weighted Bayesian network model based on mutual information and K2 scoring function is proposed.By selecting the first attribute and determining the k value,the model fits the low-dimensional edge distribution of the nodes in the constructed Bayesian network with the full distribution of the attributes in the high-dimensional data set to improve the classification accuracy.(2)In the case where the Bayesian network model is established,noise is added to the data set to satisfy differential privacy protection.When the noise is added,the order of the attribute fields is considered,and the heteroscedastic addition method is adopted,so that the data set has a certain privacy and has high availability.(3)The availability,security and algorithm performance of the generated noisy data set are proved by experiments,and compared with other existing similar algorithms,the algorithm proposed in this paper is better.Figure[13]table[7]reference[55]...
Keywords/Search Tags:high-dimensional data, differential privacy, privacy preserving data publishing, Bayesian network, heteroscedastic noise
PDF Full Text Request
Related items