Font Size: a A A

Research Of Classification Based On Anonymized Data

Posted on:2012-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:R C ZhangFull Text:PDF
GTID:2178330338992288Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, especially the development of database technology, the collection, management and analysis of the massive data become convenient. Various data mining techniques including the classification played a very active role in a number of deep-level applications. But at the same time, it has also brought many problems in terms of privacy protection. Data mining makes great benefits. Meanwhile, once private information disclosed can bring great harm to people, because the data for data mining contain a number of personal privacy information . If the information is given to data miners, it's inevitable to disclose privacy information. With the field of data mining being used deeply, it's a focus that privacy information is disclosed more and more seriously. For these reasons, how to implement a data mining under privacy protection becomes a hot focus in research of data mining.Classification is an active research field in data mining. Many different techniques have been proposed for classification: decision tree classification, the nearest neighbor classification, Neural network classification, support vector machine classification and bayes classification.However, these algorithms are based on the original data, and they could disclose private information easily. With the depth study of uncertain data, uncertain data mining has become a hot topic in data mining. It is a trend that the traditional classification has been extended to the field of uncertain data.This project focuses on the classification based on anonymous data, model anonymized data as uncertain data by k-anonymity. We propose a new approach for building classifiers using anonymized data. In the method, we do not assume the probability distribution of any data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data as new attributes. This new attribute consists of expected value and variance for numerical quasi-identifiers and probability mass function for categorical quasi-identifiers. Then, it can calculate expected values of kernel functions or square distances easily. Finally, we use the classifier to classify over anonymized data.This paper proposes a kind of improved method of building a classifier using anonymized data----KCNN-SVM. In this method, we achieve using anonymized data for classification, improve the classification algorithm and improve the classification efficiency.
Keywords/Search Tags:privacy preserve, classification for anonymized data, KCNN-SVM
PDF Full Text Request
Related items