Font Size: a A A

Research In Microaggregation Algorithm For K-Anonymization

Posted on:2010-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:T T CenFull Text:PDF
GTID:2178360278968331Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The k-anonymity model has been extensively investigated for its simplicity and practicability. The model requires that each record in the anonymized table be indistinguishable with at least k-\ other records within the table with respect to a set of quasi-identifier attributes. In this case, individuals cannot be uniquely identified by adversaries, so the individuals' privacy can be preserved. Most existing k-anonymized algorithms are based on generalization and suppression techniques, which have some defects on efficiency and numerical data semantics preservation. Recently, microaggregation techniques have been introduced to implement datasets k-anonymization, which remedy some defects of generalization and suppression techniques. The idea of microaggregation is that a table is partitioned into several clusters based on some heuristic methods, which requires each cluster should contain k records at least. The records in the same cluster are as similar as possible. Then the records of each cluster are replaced by the cluster's centroid to implement k-anonymization.In this thesis, we investigate a microaggregation algorithm for global search solution, implement a microaggregation algorithm for mixed data and propose a comprehensive evaluation framework for microaggregation algorithm. The main contributions are as followed:(1) An ICSMA (Immune Clonal Selection Microaggregation Algorithm) is proposed to improve the quality of anonymized data, which improved the standard ICSA by introducing adjusting operator which can delete invalid antibody during antibody evolution to accelerate convergence speed. The experimental results show that ICSMA generates anonymity tables with less information loss and lower disclosure risk as compared with MDAV algorithm.(2) A microaggration algorithm for mixed data is proposed to solve the drawback of existing microaggregation algorithms on anonymizing the categorical data. The algorithm adoptes euclidean distance for numerical data, and adoptes weighted hierarchy distance for categorical data and then combines above distances as mixed distance for mixed data. We take mode values as the centers for categorical attributes, simultaneously, take mean values be the centers of numerical attributes. Then the record values of each cluster are replaced by above centroid to implement k-anonymization. Experiments show that the distance measurement for categorical data causes less distortion, and the improved microaggregation algorithm based on the mixed distance enjoys better clustering quality than the traditional MDAV algorithm.(3) An evaluation model for k-anonymized data oriented to microaggregation (EM4AD0M) is proposed. The model can evaluate microaggregation algorithm from the view of data utility, information loss, and the trade-off of data utility and information loss. Experimental results show that the model can evaluate the anonymity data comprehensively.
Keywords/Search Tags:K-anonymization, Generalization/Suppression, Microdata, Microaggregation, Privacy Preservation, Immune Clonal Selection Algorithm
PDF Full Text Request
Related items