Font Size: a A A

Research On Anonymity Models And Algorithms For Privacy-Preservation Data Publishing

Posted on:2011-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2178360308470759Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
There are plenty of data relating to individuals, named microdata, in database area, such as demographic data, customer shopping data and mediacal data etc. These data play an important role in trend analysis, market prediction etc. However, because these data often contain private information about individuals, whose publishing or sharing will threat individuals'privacy. Thus, research on privacy preservation in data publishing or sharing has great significance.Privacy preservation in data publishing has gain widely concern and become a hot topic in database area. Among the various privacy-preservation researches, anonymity has become one of the most popular methods for its security and effectivity. The main idea of anonymity is to modify the original data to make sure that the adversaries can not uniquely identify individuals'identity or sensitive values, thereby protecting individuals'privacy. This thesis mainly concentrates on the anonymity models and anonymity algorithms, and our main contributions are as follows:(1) An efficient algorithm for realizing k-anonymity, named TopDown-KACA, is proposed. KACA is one of the k-anonymization algorithms which generate less information loss. However KACA is inefficient, especially when dataset is large. Top-down is an efficient anonymization algorithm, but it generates heavy information loss. Thus, this thesis proposes the TopDown-KACA, which combines the Top-down with the KACA. Experiments show that the proposed algorithm generates similar information loss with KACA and has similar efficiency with Top-down, thus it can realize k-anonymity effectively.(2) An anonymity model, which can realize sensitive values'individuation privacy preservation, is proposed. Existing anonymity models for privacy preservation, such as k-anonymity,/-diversity etc, do not consider individuation privacy preservation for different sensitive values, thus they can not provide enough privacy preservation for each individuals when sensitive values are not uniformly distributed. To solve this problem, the thesis proposes a complete (α,k)-anonymity model which can implement individuation privacy preservation for each sensitive values by setting the frequency constraints on each sensitive value of every equivalence class. The thesis also proposes a (α,k)-clustering algorithm based on weighted hierarchies distances. Experimental results show that the complete (α,k)-anonymity model can realize individuation privacy preservation effectively.(3) A multi-leve1 l-diversity model for numerical sensitive attributes is proposed. The /-diversity model is suitable for processing categorical sensitive attribute, rather than numerical sensitive attribute. To address this problem, this thesis proposes a multi-level l-diversity model for numerical sensitive attribute. The proposed model first divides numerical sensitive attribute domain into several levels, then realizes l-diversity on them. The thesis also designs an l-incognito algorithm to realize multi-level l-diversity model. Experiments compare the proposed model and the existing l-diversity model in terms of the diversity of anonymity tables, and experimental results show that the diversity of anonymity tables generated by the former is higher than that generated by the later, thus the former can provide stronger preservation to resist homogeneity attack and background knowledge attack.
Keywords/Search Tags:privacy preservation, k-anonymity, complete (α,k)-anonymity, multi-level diversity, generalization
PDF Full Text Request
Related items