Font Size: a A A

Research On Anonymity Methods To Improve The Data Utility Of Anonymous Data For Microdata Publishing

Posted on:2015-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y L MaFull Text:PDF
GTID:2298330431493429Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Microdata play an increasingly important role in data analysis and scientific research. Therefore, many organizations are collecting and sharing microdata. However, publishing and sharing microdata will put risks on individual’s privacy. Due to this problem, privacy preservation for publishing microdata has become a hot topic in the area of data mining.At present, the anonymous method, for its security and effectiveness, becomes the hot spot among the oriented microdata privacy protection methods. The anonymous method is an approach of data preprocessing, whose goal is to reduce the probability that an attacker uniquely identifies the individual identity information, so as to protect the privacy of individuals. In combination with the knowledge of noise technologies and fuzzy rough sets, this paper realizes the privacy protection research based on clustering.The main points of the work are following:(1) This paper presents a privacy preserving data publishing method based on generalization and noise techniques. Generalization is a popular technology to implement k-anonymity model by replacing the real value of quasi-identifier with a less specific but semantically consistent value. When the distribution of original microdata is uniform, generalization technology can anonymize the microdata effectively. In contrast, when the distribution is uneven, generalization will distort the original data greatly. To address the problem, we propose an anonymous method combining generalization with noise techniques, named the GN method. The method decreases the degree of generalization by adding noise tuples in the process of anonymization. We also propose a model of adding noise tuples, which can make the distribution of sensitive attribute values of an anonymous table as close as possible to that of the original table. We use the GN-Bottom-up algorithm to achieve GN, and experimental results show that the anonymous data by GN have lower average information loss and higher classification accuracy compared to traditional generalization method. (2) Here we also present a weighted clustering privacy preserving data publishing method based on fuzzy rough set. The anonymous data from the existing privacy protection methods have the problem of the application defects based on the clustering. Aiming at this problem, we propose a privacy protection method of considering the attribute weight (FSRS). In order to achieve the goal of improving data clustering application, we get the clustering attribute weights by introducing the objective weight allocation method based on fuzzy rough set. We also prove the effectiveness of the proposed method through the clustering analysis method of Weka.(3) Based on the study of the working point two, we proposed an improved weighted clustering privacy preserving data publishing method based on rough set (PBRS). We get the clustering attribute weights by the importance definition of system attributes based on rough set. We also prove the effectiveness of the proposed method through the clustering analysis method of Weka.
Keywords/Search Tags:privacy protection, k-anonymous, noise techniques, fuzzyrough set, generalization
PDF Full Text Request
Related items