Font Size: a A A

A Personalized Privacy Protection Anonymous Method For Data Publishing

Posted on:2023-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:H LuFull Text:PDF
GTID:2530307103981349Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
As the society steps into the era of big data,tens of thousands of data are produced every day,when data publishing provides information sharing function for people,it also brings threats to the privacy security of data.Therefore,in recent years,how to prevent privacy leakage in the process of data release has become a research hotspot in the field of information security.Data anonymization technology,due to its simple principle,can not only cause a small loss of information but also better protect the data,and can take into account the availability and security of the data.Since Latanya Sweeney et al.formally proposed the concept of kanonymity,some scholars have proved through experiments that k-anonymity model can effectively resist identity disclosure caused by link attacks in the process of data release,but it cannot prevent attribute disclosure.Based on the deficiency of k-anonymous model,many improved models such as l-diversity model,t-closeness model,p-sensitive k-anonymous model,(α,k)-anonymous model are proposed one after another,there are still more or less deficiencies.Based on the p-sensitive k-anonymous model and(α,k)-anonymous model,this paper proposes a personalized privacy protection anonymous model for data publishing.The main innovations of this paper are as follows:Firstly,the advantages of p-sensitive k-anonymous model to resist homogenous attack by diversity constraints on sensitive attribute values and(α,k)-anonymous model to resist skew attack by restricting the occurrence frequency of sensitive attribute values are integrated into the model proposed in this paper.Secondly,the sensitive attribute values are grouped according to their semantic similarity,so that the semantically similar sensitive attribute values can be scattered into different equivalence classes as much as possible,so as to resist the similarity attack.Thirdly,aiming at the problem that the(α,k)-anonymous model uses the same frequency constraint threshold α to constrain all sensitive attribute values,the sensitivity levels of sensitive attribute values are divided according to the subjective will of users(data owners),and the frequency constraint of sensitive attribute values is further enhanced to the frequency constraint of sensitivity levels.Different frequency constraint threshold α is set for different sensitivity levels,lower frequency constraint threshold is set for high sensitivity levels,and higher frequency constraint threshold is set for low sensitivity levels.Fourthly,considering the different personal privacy protection needs,allow individuals to set privacy protection levels for their own sensitive information,we will make different generalizations according to different privacy protection levels,by this way can avoid the excessive protection for some sensitive attribute values while insufficient protection for others,realizing the personalized privacy protection.This paper conducts simulation experiments based on the benchmark data set-Adult data set in UCI machine learning library,to verify the performance of the proposed model.Compared with the k-Anonymous model and(α,k)-Anonymous model,the experimental results show that the proposed model can achieve better privacy protection with acceptable information loss.
Keywords/Search Tags:privacy protection, k-anonymity, data publication, similarity attack
PDF Full Text Request
Related items