Font Size: a A A

Research On K-anonymity Algorithm And Anonymous Technology In Privacy Protection

Posted on:2012-04-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M HeFull Text:PDF
GTID:1488303356471264Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining and data publication are two important concerns of database applications. Knowledge discovery and data mining play a key role in many database applications. Data mining is to extract underlying valuable knowledge, models, or rules etc, while data publication is to illustrate the data to users directly. However, without careful protection measures, person-specific data will suffer privacy leakage, thus harming the data owners. Take the mining of association rules from medical cases for further disease predication and control as an example. The databases in hospitals that maintain medical cases may contain disease information of particular individuals. Privacy becomes a more and more serious concern in many data applications.Anonymization is an effective approach to prevent privacy leakage. The principle is to alter the original data (generalizing or suppressing etc.) and make them useless to identify any individual even with external information. How to anonymize data that contains privacy information has attracted much research work and received considerable attention from the database community. Generally speaking, data privacy preservation involves the following two concerns:(1) how to protect privacy in the appli-cations; (2) how to improve data utility. Both academic and industrial fields are eager to seek a balance between the two concerns.This thesis focuses on improving the data utility while providing enough privacy preservation, and involves anonymization algorithm and technique. The contributions of this thesis are:(1) K-anonymity is one of the most important anonymity models, which can be achieved by many techniques. Among them generalization is a common technique that has been used in many previous methods. The generalization-based approach follows a common framework to achieve K-anonymity of a table:divide the table into many QI (quasi-identifier)-groups so that the size of each QI-group is at least?K. However, we find that the utility of anonymized data produced by this approach can be significantly improved without sacrificing K-anonymity privacy requirement if the size of QI-group is reduced. Motivated by this observation, we propose linking-based anonymity model, which achieves K-anonymity with QI-groups of the size less than K. To implement linking-based anonymity model, we propose a simple yet efficient heuristic local recoding method, whose correctness can be proved by theoretic analysis. Through extensive experiments on real data sets, we demonstrate that the utility has been significantly improved by our approach compared with the state-of-the-art methods.(2) We analyze the feature of the marginal publication, and explore the drawback of the existing framework. We introduce the concept of m-invariance and give the sufficient and necessary conditions for satisfying m-invariance partitions. It takes linear time to check whether the satisfying partitions exist. After extensive experiments on real data are conducted, the results show that our technique allows highly effective data analysis while offering strong privacy guarantees. (3) Based on the existing anonymization technique, a new effective technique called permutation anonymization (PA) is presented, which optimizes the data utility. PA can be seen as an advancement of the Anatomy technology. PA is able to retain more information in the microdata. Moreover, PA can resist the presence attack and enjoy a wider range of applications. We present the details of the technique and build its underlying theory.(4) Encryption is an important technique in the distributed environments for privacy preservation. We focus on the stability of several famous cipher streams, including Legendre sequences, Hall sextic residue sequence and generalized cyclotomic sequence. We give the GF(p) linear complexity of Legen-dre sequence, which is a generalized result of Dingcunsheng's famous paper "On the linear complexity of Legendre sequences [1]". The(?)-error linear complexity of generalized cyclotomic sequence is also given, and the result shows that it can be approximated by minimal polynomial with degree no more than p+q, which is much less than its complexity L about (L?[(?), pq]). Moreover, we studied on the GF(p) linear complexity of Hall sextic residue sequence.
Keywords/Search Tags:privacy preservation, k-anonymity, marginal publication, anonymization technique, cryptog-raphy, algorithm
PDF Full Text Request
Related items