Font Size: a A A

Research On Implicit Privacy Protection Method Based On Clustering Model

Posted on:2015-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:X M GaoFull Text:PDF
GTID:2308330479489712Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Government agencies and other organizations produce huge amount of data every day. Such mixture of data is also called “data market”. With the strong promotion of the Centralized Data Storage Management and the rapid development of the Internet, the publishing and sharing data is desired. However, when publishing lots of data, the privacy disclosure would be inevitable. So, how to resolve the conflict between privacy disclosure and data quality attracts researchers’ attentions.Traditional generalization hierarchy based privacy-preserving method usually focus on equivalence class or data blocks, which would make attackers hard to refer the identifier or reduce attackers’ posterior knowledge. This type of strategy only considers parts of data, which is called a local method. Thus, the limitation of this strategy is that the global cost is neglected, which ignores the global cost function and neglects the changes towards the model of the original data set. To solve the above problem, this thesis proposed two novel ideas: the novel t-closeness method and the Gaussian mixture model based on attribute, respectively.Firstly, to cope with the problem that the original t-closeness ignored the global cost to publish data during suppression process, we proposed to add a new constraint d. In order to minimize d, the record cost least would be suppressed, so that the global cost would be reduced.Secondly, to bridge the relationship between sensitive attributes and the cluster model of data, we adopted an improved Gaussian mixture model based on private feature selection. To enhance model’s discriminative ability, the original component would be further divided into three parts. To get model parameters, integrated likelihood function would be adopted. Our model could select features directly. To keep certain distance between the cluster model and original one, the weight of sensitive attributes would be limited into a specific range, thus the published data would get a global protection.The results of experiments show the proposed t-closeness method perform better to protect private data. And the novel Gaussian mixture model with privacypreserving has a stronger ability of feature selection.
Keywords/Search Tags:privacy-preserving, t-closeness, gaussian mixture model, feature selection
PDF Full Text Request
Related items