Font Size: a A A

Research On Several Key Problems Related To Anonymity Data In The K-anonymity Privacy-preserving Model

Posted on:2013-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L SongFull Text:PDF
GTID:1118330362962912Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The privacy security has encounterd increasing challenges in the data publishingprocess on account of the expansion of Internet and publicly available data. The attackermay re-identify individual privicay information merely through linking the releaseddataset and relative datasets. K-anonymity privacy model is the most basic and significantmean of protection from linking attack in the data publishing process. Recently, thepresent researches shift the focus from k-anonymization method to the anonymous data.This paper implements a serials of researches on the key problems in k-anonymityprivacy model associated with anonymous data, such as quasi-identifier determination,optimal k-value selection, k-anonymous data generation, and k-anonymous data set updatemaintainence. The major resarch are presented as follows.Firstly, a quasi-identifier determination method based on hypergraph was proposed.Accurate quasi-identifier is the determinative factor of the k-anonymity model validity anddata quality. An oversized quasi-identifier will lead to over anonymous of released datawhile an undersized quasi-identifier may cause the invalidity of k-anonymity model. Thisalgorithm maps the released and underreleased view into a hypergraph, in which theproblem of seaching relative view set is transformed into the problem of seaching thewhole paths between hypergraph nodes and thus proposed the view set solution algorithmbased on hypergraph. Basing on this, a quasi-identifier solution algorithm is presented, thecorrectness of the proposed algorithm is proved and the time complexity was analyzed.The correctness of the proposed algorithm is proved by experiments too.Secondly, an optimal k-value selection method was proposed. The key technology inPrivacy-Preserving Data Publishing is how to balancing privacy protection and dataquality. Both protection degree and data quality in k-anonymous table depend on thek-value in k-anonymity privacy-preserving model. The range of k-value in accord with theprivacy leak probability formula in different situations analyzed based on the analysis andproven of relations between k, privacy protection and data quality. The k-value range thatsatisfies the data quality requirement is also analyzed according to k-anonymous tablequality formular. A k-value optimization algorithm is presented on the basis of the relationbetween k-values that satify privacy protection and data quality requirement. Thecorrectness of the algorithm is proved by experiments finally. Thirdly, a data generation method keeping data dependency was presented. A newdata dependency, namely K-MSD is defined, in which the datasets that satisfy K-MSDalso satisfy the k-anonymity constraint. When there is no FDs or MVDs in publishingdata set, the K-MSD-ANONY algorithm constructs K-MSDs between attributes to achievek-anonymity of publishing data. When there are FDs or MVDs in publishing data set, theK-MSD-AG-ANONY algorithm use Association Generalization to preserve FDs orMVDs while constructs K-MSDs between attributes to realize k-anonymization. Theperformance of the K-MSD algorithm and K-MSD-ANONY algorithm with otheralgorithms are compared by experiments finally.Fourthly, an incremental updating method of k-anonymous dataset is proposed. Thecontinuous updating of basic databased makes it nessary to keep simultaneous updating ofk-anonymous dataset on the purpose of keeping consistency. After the analysis of thek-anonymous dataset possible responses of insert, delete, modify and update operations inbasic database, the updated tuple positioning in k-anonymous dataset should be primarilyfulfilled in accord with semantic similarity and tuple mapping and then execute the updateoperation for the updated tuples. The updating algorithms are validated by experimentsfinally.
Keywords/Search Tags:Privacy-Preserving Data Publishing, K-anonymity, Quasi-identifier, K-value, K-anonymization, Update maintainence
PDF Full Text Request
Related items