Font Size: a A A

Research On Data Anonymization Techniques For Data Publishing

Posted on:2017-09-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Y GongFull Text:PDF
GTID:1318330515985538Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data anonymization is a widely-used privacy protection technology based on data coarsen-ing and hiding.Existing data anonymization techniques mainly achieve anonymity through gen-eralization and suppression etc.,which keep the truthfulness of data,such that the anonymized data remains practically useful.Meanwhile,an adversary cannot re-identify an individual or his/her sensitive information from anonymized data with high confidence.At present,data anonymization techniques are mostly used to address privacy issues in data publishing,loca-tion based service,social network and data querying.Among them,the data anonymization techniques for data publishing are some of the most important techniques for privacy preserv-ing data sharing.However,these techniques suffer from several drawbacks in practice.This dissertation studies data anonymization techniques for data publishing,and addressed some of these deficiencies in practice.Directly applying state-of-the-art data anonymization techniques to high-dimensional,high missing rate and complex relational datasets suffers from the following drawbacks:First,ex-isting data anonymization approaches may distort most useful information due to the curse of dimensionality in high-dimensional data publishing.Second,most of existing anonymization approaches are not adaptable to missing values.Directly applying them to high missing rate data may cause extensive information loss due to the missing value pollution.Third,applying state-of-the-art approaches to complex relational data may cause privacy breach and extensive information loss,due to multiple occurrences of an individual in the dataset.Thus,it is important to develop novel anonymization models and algorithms to address these issues.In this dissertation,we proposed several approaches for high-dimensional,high missing rate and complex relational dataset:First,in high-dimensional data publishing,we proposed a top-down algorithm named Semi-Partition and a bottom-up algorithm named NEC Based Anonymization.By adapting outliers and utilizing NEC(Natural Equivalence Class),both al-gorithms can reduce information loss and achieve better utility on high-dimensional data.Sec-ond,in high missing rate data publishing,we proposed a bottom-up algorithm named KAIM and a top-down algorithm named Semi-Partition-Incomplete.By row-based and column-based missing value isolation,both algorithms can reduce missing value pollution and perverse more utility on high missing rate data.Third,in complex relational data publishing,we proposed(k,l)-diversity,a hybrid privacy model to preserve privacy during publishing relational and transac-tion data(a special case of complex relational data).Then we proposed three algorithms:APA,PAA and 1M-Generalization to achieve(k,l)-diversity.By uniting relational and transaction anonymization,our algorithms can achieve anonymity with low information loss on complex relational data.On basis of all these works,we proposed a privacy preserving data publishing and eval-uation prototype system named PPDPES(Privacy Preserving Data Publishing and Evaluation System),which can anonymize high-dimensional,high missing rate and complex relational mi-crodata without compromising privacy.Compared with existing works,our approaches are more reasonable,more efficient,more applicable and can preserve more data utility during privacy preserving data publishing.
Keywords/Search Tags:data anonymization, data privacy, data publishing, incomplete dataset, highdimensional data, complex relational data
PDF Full Text Request
Related items