Font Size: a A A

Research On Privacy-preservation Technology For Publishing Data

Posted on:2010-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z P CengFull Text:PDF
GTID:2178360278462383Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, large amount of personal information is stored and released by government departments and commercial establishments. However, as an effective method for information sharing, data publishing also brings threats to personal privacy while it provides convenience to data exchange and date sharing. Although an independent organization of the data publishing will take measures to hide the personal identity in the data release,taking the linking operation among several external data resources will often lead to unexpected privacy's revealing problems. K-anonymity can protect private data from linking attack, which can solve the problems of the identity's revealing. But there's no protecting mechanism against the disclosure of sensitive attribute, and it doesn't perfectly consider the sensitivity of the sensitive attribute. This article comprehensively studies and analyzes the current anonymized technology in the process of data publishing, then it come up with a kind of new anonymous program and algorithm. The main contents and the contributions are as follows:Current privacy protection models for publishing data don't care for the sensitivity of sensitive attributes. In fact, sensitive attributes with different level of sensitivity should be protected in different intensity. Based on this idea, a novel model, (p, a)-sensitive K-anonymity is proposed. It divides sensitive attributes into groups according to the sensitivity, and sets each group with different restriction. The result of the experiments suggests that the new model is able to reduce privacy disclosure apparently and enforce security of data publishing.Through the analysis of the current shortage of anonymous generalization algorithm, cluster analysis methods are introduced to (p, a)-Sensitive K-anonymity model, K-anonymity problem is transformed into the matter of K cluster members, distance-based clustering method is applied to (p, a)-Sensitive K-anonymity model to calculate the distance between tuples, each tuple is made to be as similar to each other as possible, and the corresponding distance definition and information loss calculation formula as well as the generalized clustering algorithm are given, and the correctness and complexity analysis of the algorithm is analyzed, and the correctness and complexity analysis is verified through experimental examples.As to the problems that the flexibility of the present generalization strategy is bad and the information loss is large and over generalization exists, this paper uses different generalization strategies for different types of quasi-identifier attributes, comprehensively tests the information loss brought by anonymous treatment in the process of clustering and uses more flexible data generalization strategy. The experimental results show that compared with traditional methods, this method can effectively reduce the information loss.At the end of this paper, some deficiencies in the research paper and what needs to be improved are analyzed and the direction of the future work of this paper is given.
Keywords/Search Tags:Data publishing, Sensitivity, K-anonymity, Privacy-preservation, Clustering
PDF Full Text Request
Related items