Font Size: a A A

Research On Several Problems In Privacy Preserving Information Sharing

Posted on:2008-09-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:1118360215984461Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid progress of computational capability, storage technology and network technology has facilitated the digitalization of information greatly. It makes that information sharing is much easier and more convenient than before. However, privacy breaches have also become frequent today. Being afraid of disclosing their privacy, people would not like to share their information. The goal of privacy-preserving information sharing is to share information effectively while preventing the disclosure of sensitive information. Recently, it has become an active direction in database and data mining research.In this dissertation, we first investigate the problem of preserving anonymity in data sharing. Then, considering the knowledge represented by a set of frequent patterns, we investigate the problems of preserving sensitive patterns in data sharing, hiding sensitive patterns in frequent pattern sharing, and also blocking inference channels in frequent pattern sharing. The main contributions are as follows:(1) For the problem of preserving anonymity in data sharing, we propose a clustering-based approach for implementing l-diversity. Our approach can meet the requirement of anonymization in data sharing, and preserve the individuals' privacy by preventing the disclosure of sensitive attribute values. Furthermore, our approach eliminates the constraints of domain hierarchy structure, which is used in traditional data anonymization approaches. It integrates the evaluation of the information loss resulting from anonymization into the course of clustering, and chooses flexible strategy of data generalization. The experimental results have shown that our approach can reduce greatly the amount of information loss caused by anonymization.(2) For the problem of preserving sensitive patterns in data sharing, we propose a data sanitization approach based on weak pattern tree. This approach can take into consideration the side effect of data sanitization on non-sensitive patterns during protecting sensitive patterns. By traversing the related parts in the weak pattern tree, the approach calculates the scores of sensitive items and sensitive transactions, and identifies a candidate item for each sensitive transaction. Then, it chooses the sensitive transactions with the higher score, and removes the candidate item from these chosen transactions to protect sensitive patterns. The experimental results have shown that our approach can meet the requirement of privacy preservation, and also can reduce the side effect of data sanitization on non-sensitive patterns to improve the utility of shared data.(3) For the problem of hiding sensitive patterns in frequent pattern sharing, we propose the idea of privacy-free frequent pattern set. It can prevent the inference of the existence of sensitive patterns from the set of shared frequent patterns, and thus provides strong capability of privacy preservation for sensitive patterns. Furthermore, we prove that it is an NP-hard problem if we want to find an maximal privacy-free frequent pattern set. Then, we present an item-based pattern sanitization approach, and show that it can guarantee generating a privacy-free frequent pattern set. We has also given the details of three item-based pattern sanitization algorithms, and compared their performance in our experiments.(4) For the problem of blocking inference channels in frequent pattern sharing, we propose to eliminate the related inference channels so that an attacker cannot derive if a sensitive pattern is frequent or not. Based on the analysis of correlation between the frequent patterns, we classify the potential inference channels into three categories: superset inference channel, subset inference channel, and chain inference channel. We also point out the privacy breaches in the approach proposed by previous work. Furthermore, we present two pattern sanitization algorithms for blocking inference channels in frequent pattern sharing, and evaluate their performance in the experiments.
Keywords/Search Tags:Privacy Preservation, Information Sharing, Data Anonymization, Frequent Pattern, Sensitive Pattern, Data Sanitization, Pattern Sanitization, Inference Channel
PDF Full Text Request
Related items