Font Size: a A A

Research On Microdata Anonymity Algorithms For Privacy-Preservation Data Publishing

Posted on:2012-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z XiaFull Text:PDF
GTID:2218330368479465Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
There are lots of data relating to individuals, named microdata, such as demographic data, customer shopping data and medical data etc. These data play an important role in trend analysis, market predicting etc. However, publishing or sharing these data will threat individuals'privacy. Thus, research on privacy preservation in data publishing has great practical significance and theoretical value.Anonymity is a privacy preservation method, which has become one of the most popular methods these yeas for its security and effectively. The main idea of it is to modify the original data to make sure that the adversaries can not uniquely identify individuals'identity to which sensitive values attributed to protect individuals' privacy. This thesis mainly investigates the anonymity models and technologies, and proposes a hybrid method to solve the defects of existing methods on anonymizing mixed data. Our main contributions are as follows:(1) A (k,e)-MDAV algorithm for numerical sensitive attributes is proposed. MDAV(Maximum Distance to Average Vector) algorithm is an efficient microaggregation algorithm. However, it does not capture diversity of sensitive values in each equivalence class, so that the anonymity table generated by the algorithm cannot resist homogeneity attack and background knowledge attack. To solve the problem, we propose a (k,e)-MDAV algorithm for microdata with numerical sensitive attribute. The algorithm groups at least the k nearest tuples to cluster center into one cluster, and further requires the range of the distinct values in one cluster to be no less than e, so that it can prevent privacy disclosure which is resulted by the similarity of the sensitive values in the same equivalence class. Experimental results show that the algorithm can matain the efficiency of MDAV algorithm, at the same time, generate anonymity more secure tables satisfying (k,e)-anonymity model.(2) An efficient hybridκ-anonymization method for mixed data is proposed. Mixed data is more general in database area. Microaggregation technology will change the probability distribution of categorical data, generalization/suppression technology will lose semantic information of numerical data, reducing the utility of data. To address the problem, this thesis proposes an efficient hybridκ-anonymization method. The idea of the method is to use the generalization values to replace original categorical data, so that can preserve more semantics for categorical data; use the mean vector of numerical data to replace original numerical data, so that can preserve more numerical semantics. In order to improve the efficiency onκ-anonymizing large dataset, we first use c-prototype algorithm to partition large dataset into several subclusters, every subcluster has more than k tuples, then anonymize every subcluster. Experimental results show it canκ-anonymize mixed data effectively.
Keywords/Search Tags:K-anonymization, Generalization/Suppression, Microdata, Microaggregation, Privacy Preservation
PDF Full Text Request
Related items