Font Size: a A A

Research On Publishing Data Via Differential Privacy With Sensitivity Restriction

Posted on:2017-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:S H ZhongFull Text:PDF
GTID:2348330488473271Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing, wireless networking and e-commerce and other Internet-based technology, a large amount of data is gathered in the network. However, all collectors or owners of these data may belong to different organizations. Integrated comprehensive analysis of data requires that data is available or shared for the analyst. Because of raw data often contains sensitive information about individuals or organizations, some effective privacy protection methods are needed to protect data privacy. Therefore, the research on privacy protection of data publishing or sharing has become a hot research field in the recent 10 years.The thesis focuses on the issue of privacy protection in publishing data with complex correlation. By reviewing and analyzing the research status of privacy preserving data publishing, the privacy disclosure caused by the inference attack is stated when the differential privacy model and the related algorithm are used for masking the complex correlation data. And two sensitivity restriction-based differential privacy model and related algorithms are proposed to prevent privacy disclosure and to avoid reducing the utility of data. The main research work are as follows.(1) The restriction sensitivity-based differential privacy is proposed to solve the problem that publishing data with complex correlation is unable to resist the inference attack and conceal privacy information and The restriction sensitivity-based differential privacy combined with k-anonymity is proposed to solve the problem that publishing data with complex correlation will add excessive noise which reduces the utility of anonymous data. The restriction sensitivity-based differential privacy makes use of the sensitivity parameters to limit the maximum confidence of all implicit sensitive rules in the sensitive template. The restriction sensitivity-based differential privacy combined with k-anonymity use the parameter k as the granularity parameter in the proposed model to limit the minimum number of records in any template.(2) The greedy partition and template specialization is used to implement two approximate algorithm for two proposed differential privacy preserving data publishing models. Since two models are NP-hard problem, the greedy strategy can effectively reduce the search space for specializing template. After the generalized data is greedily partitioned, the template specialization strategy can effectively improve the utility of the anonymous data. Then the discussion about the security and time complexity of the proposed models and algorithms shows that those models and algorithms can meet the privacy requirements and has a better scalability.(3) Based on two implemented approximation algorithms, the corresponding privacy preserving data publishing system is built. And the experiment is carried out on the system using the real data set Adult. Through experimented on differential datasets with various size, all results about the classification error rate of decision tree and consumption time of privacy preserving explains that two models proposed in this paper can be safe and effective for data publishing. Meanwhile, with the increasing of the size about the experimental data,the result that the running time has little change can explain that the two algorithms for large scale data is still valid. And it is showed that a moderate granularity parameter could improve the utility of the anonymous data.
Keywords/Search Tags:table data, data publishing, privacy protection, differential privacy, k- anonymity
PDF Full Text Request
Related items