Font Size: a A A

The Research On Privacy-preserving Data Publishing For Data Classification Analysis

Posted on:2015-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:J D WuFull Text:PDF
GTID:2298330431993448Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With urgent demands for data sharing, Privacy-preserving Data Publishing has made great progresses. As we know, data mining needs a lot of data to support, and the department of data publishing is the data provider directly. So it is a hot topic how to succeed in data mining while protecting the privacy of data. Recently, data privacy protection technology has become one of hot topics in the field of data mining, and the k anonymity model is an important idea to achieve data privacy protection. However, the existing k anonymity methods do not consider how to mask data for a specific application. So it is very meaningful work to anonymize for a specific application and to obtain better anonymous data for that application.Based on the studies of data mining classification and privacy models, the paper proposed anonymity techniques for data mining classification. Our techniques no longer require to make the information loss at minimum, but rather require anonymization process to affect classification as little as possible. The paper also proposed the concept of attribute weight by considering the role of attributes for classification process. In short, the importance of an attribute in the data mining classification may vary, depending on performance in the classification. And then define the weight based on the performance, and generalize the different attributes at different degree. Recently, the research of k anonymity for classification techniques attracts more and more attention. Many new k anonymity methods are proposed, which meet privacy requirements while maintaining the data availability in the data mining fields. In this paper, we studied the privacy protection technology from two aspects, data availability and safety. The specifics are as follows:(1) This paper proposes an anonymity method for data mining classification with weighted attributes. We compute the weight of each attribute with Information GainRatio. The method based on attribute weights can generalize the large-weight attributes at lower degree and the small-weight attributes at larger degree. Then we propose an attribute weighted Bottom-Up k-anonymity algorithm. Experimental results show that the proposed method can generate higher quality anonymous data for classification analysis than traditional methods.(2) The paper also proposes an anonymized method for classification. The method first determines the best generalization level for classification based on Information GainRatio, and generalizes each attribute to the best level during anonymization process. We also define a suppression policy to deal with the tuples that do not satisfy anonymous constraint. I this paper, we also propose a Weighted Full-Domain Generalization (WFDG) algorithm on the basis of the proposed method. The experimental results show that the method can get higher quality anonymous data for classification mining.
Keywords/Search Tags:classification, data mining, information gain, data disturb, privacy preservation, k-anonymity
PDF Full Text Request
Related items