Font Size: a A A

Robust Data Anonymization Techniques In Privacy-Preserving Data Publishing

Posted on:2022-08-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:H R a z a u l l a h K h a Full Text:PDF
GTID:1488306326979419Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The development of information and communication technologies has boosted every aspect of life and has created a digital world,where every individual has easy access to modern technology.This enables the collection of information from individuals on every second.Such data are in a high volume,generated by these individuals in a high speed that results in Big Data technology.These data must be processed with advanced techniques to get meaningful information.Publishing of the collected data for research innovation,and policy making without proper processing for privacy implementation can entangle the individual's privacy at stack.Therefore,eliminating privacy threats,and preserving user information is obligatory while releasing the data,and hence is the study of Privacy-Preserving Data Publishing(PPDP).For PPDP,classical anonymization techniques,e.g.k-anonymity,l-diversity,t-closeness and their variants,have been proposed for privacy implementation.However,these techniques still pose serious privacy leakage issues in implementing enhanced privacy and required maximum utility.In this dissertation,we investigate different privacy threats which exists in various privacy techniques.As a mitigation solution we propose few robust anonymization techniques,which are strong enough to address those privacy threats.The basic purpose is to anonymize the data in such a way that the privacy of an individual cannot be breached by an adversary.Our three research contributions are categorized into two parts;(i)Static data publishing(i.e.one-time release)in Part ?,and(ii)Dynamic data publishing(i.e.more than one-time release with insert,update,and delete operations)in Part ?.For static data,Privacy Preserving Static Data Publishing(PPSDP)techniques are proposed which anonymize the single Sensitive Attribute(SA)and Multiple Sensitive Attribute(MSA)data.While in dynamic data,Privacy Preserving Dynamic Data Publishing(PPDDP)technique is proposed for sequential anonymization of data.All the proposed techniques improve the privacy of data,and publishes more quality data.The main contributions of the dissertation are as follows.(1)To address the problem of/-diversity in a more simple way,a PPSDP technique named ?-sensitive k-anonymity privacy model is proposed which is a numerical measure of privacy strength.The proposed model prevents the two privacy vulnerabilities;the sensitive variance attack and categorical similarity attack,identified in p+-sensitive k-anonymity,and balanced p+-sensitive k-anonymity privacy models.The proposed model works effectively for all k-anonymous size groups and can prevent sensitive variance,categorical similarity,and homogeneity attacks by creating more diverse k-anonymous groups.Furthermore,formal modeling and analyzation of the base and the proposed privacy models shows the invalidation of the base and applicability of the proposed work.Experiments show that the proposed model outperforms the others in terms of improved privacy(14.64%).(2)A considerable amount of research work exists for anonymizing single sensitive attribute.However,a more practical scenario in PPSDP with MSAs have not yet focused enough.Although,a recently proposed technique;(p,k)-angelization for MSA provided a novel solution in this regards where one-to-one correspondence between the buckets in Quasi Table(QT)and Sensitive Table(ST)has been used.However,a possibility of privacy leakage is identified in(p,k)-angelization through MSAs correlation among linkable sensitive buckets and named it as fingerprint correlation(fcorr)attack.A proposed solution,named(c,k)-anonymization prevents the privacy breaches in(p,k)-angelization.The proposed solution thwart the fcorr attack using some privacy measures,and improves the one-to-one correspondence to one-to-many correspondence between the buckets in QT and ST.The experimental results on real world dataset demonstrate that the proposed work has a zero vulnerable records and comparatively lower privacy risk(i.e.0.11)than the(p,k)-angelization(i.e.privacy risk 0.25),for example for a group size of 4.(3)A more real-world scenario is the PPDDP,where the original data is anonymized and released periodically.Each release may vary in number of records due to insert,update,and delete operations.An intruder can combine i.e.correlate,different releases to compromise the privacy of the individual records.Most of the literature,such as ?-safety,?-safe(l,k)-diversity,have an inconsistency in record signatures and adds counterfeit tuples with high generalization that causes privacy breach and information loss.The proposed approach is an improved privacy model(?,m)-slicedBucket,having a novel idea of "Cache" table to address these limitations.It is indicated that a collusion attack can be performed for breaching the privacy of-r-safe(l,k)-diversity privacy model,and demonstrate it through formal modeling.The objective of the proposed(?,m)-slicedBucket privacy model is to set a tradeoff between strong privacy and enhanced utility.Furthermore,formal modeling and analyzation of the proposed model shows that the collusion attack is no longer applicable.The numerical results elaborate that the proposed technique uses zero counterfeit tuples while the r-safe(l,k)-diversity,uses 4.7%counterfeit tuples.Finally,in this dissertation a number of insights are suggested for possible future research perspectives.
Keywords/Search Tags:Privacy-Preserving Data Publishing, k-anonymity, Big Data, Data Privacy
PDF Full Text Request
Related items