Font Size: a A A

Research On Anonymity Techniques For Personalization Privacy-preserving Data Publishing

Posted on:2013-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WangFull Text:PDF
GTID:1268330425466994Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, as the rapid development of Internet, data storage and computingtechnology, the process of information collection and analysis becomes more and moreconvenient, complete and accurate. However, the data publishing processes, which aim toinformation sharing, data mining, and knowledge discovery in database (KDD) and so on, areoften accompanied with the disclosure risk of sensitive privacy information. Meanwhile, theresearch of privacy preserving in the data publishing (PPDP) is presented for the reasonmentioned above, whose major target is to improve the released data security by losing someinformation in raw data appropriately under the premise of the data utility guarantee, and thento provide a very good trade-off between achieved privacy preserving and data utility.Moreover, considering the different data entities have different requirements of sensitiveinformation protection degree, the personalization service has also become a hot issue in thefield of PPDP. Finally, based on the above analysis, this paper makes deep and detailed studyon the anonymity techniques for personalization privacy preserving data publishing in the caseof ensuring the strong information utility.Firstly, for achieving the different privacy preservation requirements of differentindividuals, we propose a personalized extension l-diversity anonymous principle based on thetraditional l-diversity by setting the corresponding guarding attributes for each sensitiveattribute, which realizes the requirement of personalized preserving of the relationshipbetween different individuals and sensitive attribute values by generalizing the sensitiveattribute with guarding attribute. Based on the concepts defined above, we present apersonalized extension l-diversity anonymous model orienting individuals finally. In addition,according to the l-diversity anonymous publishing principle which requires the sensitivevalues in the releasing equivalent classes having enough diversification, we introduce aninverse clustering ideology to achieve the equivalent partition of the releasing dataset. In themeanwhile, we propose a personalized extension l-diversity inverse clustering algorithm(PELI-clustering) to implement the personalized extension l-diversity anonymous modelorienting individuals. Furthermore, the correctness and complexity of the algorithm is alsoanalyzed theoretically. Finally, the two sets of different simulation experiments show that thePELI-clustering algorithm not only produces similar information loss to the traditionalclustering-based l-diversity algorithm with less time cost, but also meets the requirements of personalized service, which achieves more effective privacy preservation.Secondly, according to the individual-oriented personality service boundedness of hardlysetting the personality parameters in the case of mass data, a personality privacy anonymousproblem orienting sensitive values is studied in this paper. On the theoretical basis oftraditional (α, k)-anonymity principle, we introduce the personality privacy sensitive factor,and calculate the personality privacy preserving requirement degrees of each sensitive attributevalue to realize the personality service of sensitive values, and then give a formalizeddefinition of the personalized (α, k)-anonymous model orienting sensitive values. Meanwhile,considering the defect of traditional generalization processing when partition the equivalentclasses, whose interval boundary position is not accurate enough, an attribute entropy-basedclassification algorithm (EBCA) is designed by using the information entropy ofquasi-identifying attributes as classification standards. And on this basis, an entropy-basedclassification approach for personalized privacy anonymity (EBCPPA) is presented toimplement the personalized (α, k)-anonymous model orienting sensitive values. Furthermore,the correctness and complexity of the algorithm are also theoretically analyzed respectively.Finally, the two sets of different simulation experiments show that comparing with classicC4.5, Naive Bayes, NBTree and k-nearest neighbor (k=3) algorithm, the EBCA algorithm hashigher accuracy. In the meantime, comparing with the existing (α, k)-anonymous methods, theEBCPPA can not only satisfy the personality service of sensitive values, but also reduce theinformation loss degree of data sets more effectively and reasonably.Thirdly, according to the problem of that can not guarantee privacy information securitywhen the privacy anonymous principle of single sensitive attribute dataset directly apply inmultiple sensitive attribute datasets, a kind of personalized privacy anonymous methodorienting sensitive values against multiple sensitive attribute is studied on the basis ofconsidering the requirement of sensitive value personalized service. On the theoretical basis oftraditional l-diversity principle, we redefine the multiple sensitive attribute l-diversity principleby introducing the coverage ideology contained in the topological space, and theoreticallyprove the correctness and safety of this definition. In addition, we introduce a personalizedcustomization scheme based on domain hierarchy partitions to realize the personality servicerequirement of different sensitive values. And on the above basis, a personalized multiplesensitive attributes l-diversity model is defined. In the meanwhile, a multiple sensitiveattributes personalized l-diversity algorithm based on minimum selected degree first(MSFMPL-diversity) is presented to implement the personalized multiple sensitive attributes l-diversity model. Furthermore, the convergence and local optimum of the algorithm are alsotheoretically proved respectively. Finally, the simulation experiments show that comparingwith MBF and MMDCF algorithms, MSFMPL-diversity algorithm not only can producesimilar information hidden rate under the premise of meeting the personalized servicerequirement of sensitive values with the same system threshold values, but also has better timeperformance and robustness.Finally, according to the re-publication privacy preserving problem of full dynamicdataset which contains external update and internal update together, a kind of re-publicationprivacy anonymity method for full dynamic datasets which contains personalized update ofsensitive attribute values is studied in this paper. Based on the full dynamic datasetre-publication disclosure risk theory, the personalized transformation probability of attributevalue is introduced to achieve the personality service of sensitive attribute update, and thepersonality re-publication disclosure risk is assessed on that basis mentioned above. Inaddition, a personalized λ-continuity privacy anonymous principle orienting dynamic datasetre-publication is presented based on the m-unique principle. In the meanwhile, an incrementpersonalized λ-continuity re-publication anonymity approach for dynamic datasets(λ-PCRAADD) is presented to implement the personalized λ-continuity privacy anonymousmodel. Furthermore, the correctness and complexity of the algorithm are also theoreticallyanalyzed respectively. Finally, the simulation experiments show that comparing with thetraditional m-invariance algorithm; λ-PCRAADD algorithm not only can produce far less thanaggregation query average relative error rate under the premise of meeting the personalizedservice requirement of sensitive values with the same system threshold values, but also hassimilar time performance and robustness.
Keywords/Search Tags:Data publishing, Privacy preserving, Personalization, Data anonymization, Information loss
PDF Full Text Request
Related items