Font Size: a A A

Research On Anonymization Privacy Protection Techniques Based On Clustering

Posted on:2014-10-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:P S WangFull Text:PDF
GTID:1268330422479748Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and database technologies, more and more data arecollected, released and used, which may contain private individual information. Therefore, how toprotect individual privacy in the process of data publishing and utilization has become a researchfocus in both academia and industry.Anonymization is one of the primary techniques for privacy protection in data release processes.The basic idea is that, to achieve privacy protection, only lower precision but semantically equivalentdata are released by performing generalization/suppression operation on quasi-identifiers.Since Sweeney proposed the k-anonymity model, anonymization techniques have attractedsignificant attention from researchers as they can easily and effectively accomplish privacy protection.Because the optimal anonymization problem is an NP difficulty problem, in order to enhance theprivacy protection in data publishing and reduce the information loss, several k-anonymity methodshave been proposed. However, these data anonymization methods are vulnerable to homogeneityattack and background knowledge attack because they mainly execute generalization/suppressionoperation on quasi-identifiers, but do not impose any restrictions on the sensitive attributes. Toaddress this issue, on the basis of k-anonymity, Machanavajjhala et al. proposed the l-diversitymodel, which takes into account the diversity of sensitive attribute values in equivalence classes, andrequires at least l “well-represented” values for the sensitive attributes in each equivalence class tofurther improve privacy protection.Research has shown that: currently, most anonymization methods based on generalization andsuppression techniques suffer from significant information loss and, as a result, poor usability mainlydue to their heavy reliance on pre-defined generalization hierarchies or the order imposed on eachattribute domain; moreover, existing anonymization technologies focus on the protection of privateinformation, but ignore the actual usefulness of anonymized data, resulting in limited applicability.The research of this thesis is to design anonymity privacy protection algorithm based on clustering,which mainly involves the study of the privacy, information loss and the usefulness of anonymousdata. Our work aims to minimize the loss of information while ensuring privacy protection, enhancethe application value of anonymized data, and ultimately achieve a reasonable balance betweenindividual privacy protection and data usability. The main contributions and innovations of thisdissertation are as follows.(1) To address the sensitivity to outliers and higher information loss issue in existing k-anonymity algorithms, an improved clustering-based k-anonymity algorithm is proposed.Theoretical analysis and experimental results show that this algorithm can effectively solve the outliersensitivity issue while generating equivalence classes through clustering only once, so that theinformation loss is reduced and the quality of k-anonymized data is improved.(2) To eliminate the adverse effect of generalization/suppression technology on the quality ofl-diverse anonymized data, an l-diversity algorithm based on clustering is provided. The alogrothmcan reduce the information loss, but it is vulnerable to skewness attack, therefore, a sensitive valueconstraint l-diversity algorithm is offered. Theoretical analysis and experimental results demonstratethat the improved algorithm can not only improve the privacy protection degree of sensitive data, butalso effectively reduce the information loss and improve the quality of l-diverse anonymized data.(3) With respect to the inference attack in dynamic anonymized data publishing, an l-diversityan algorithm based on incremental clustering are proposed. Theoretical analysis and experimentalresults indicate that the algorithm can realize secure data release for full dynamic updated data setswith high efficiency by keeping the same signature of each equivalence class.(4) To solve the issue of the poor usability of released anonymized data, we design a dataclassification oriented l-diversity algorithm through creating a utility influence matrix ofquasi-identifier attributes on sensitive attributes. Theoretical analysis and experimental resultsillustrate that the algorithm can satisfactorily meet the application requirements of data classificationwhile protecting individual privacy.
Keywords/Search Tags:Data release, privacy protection, k-anonymity, l-diversity, clustering, usability
PDF Full Text Request
Related items