Font Size: a A A

Research On Privacy Preserving Classification Data Mining

Posted on:2011-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:B TangFull Text:PDF
GTID:2178330338978805Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the development of technology of the database and network, large amount of data has been accumulated, It has become the focus of people's attention that how to extract valuable knowledge to make decisions. Data mining which is a powerful tool can find potential model and law. It has made great contribution in many areas and has a bright future. Data mining make great benefits, meanwhile, once private information disclosed can bring great harm to people, because the data for data mining contain a number of personal privacy information such as the patient's information about disease, customer's favorite, personal background and so on. If the information is given to data miners, it's inevitable to disclose privacy information. With the field of data mining be used deeply, it's a focus that privacy information is disclosed more and more seriously. For these reasons, how to implement a data mining under privacy protection becomes a hot focus in research of data mining and privacy protect data mining (PPDM) comes into being.The method that privacy preserving classification data mining using decision has become focus of study in area of data mining recent years ,because classification data mining is the main type of data mining and decision tree is most used as classifier in classification mining. Currently, there're many ways to modify data in privacy protection data mining. Random perturbation technology that can't change the essential character of the original data is used mostly. However, the method of privacy preserving classification data mining has many defects, such as limited to data's character, generating privacy destroy after random perturbation, high error rate of reconstructing distribution of the original data, privacy preserving degree is low, accuracy of mining results and so on. For these reasons, we propose the method of privacy protection classification mining. It use random perturbation matrix to change data and reconstruct distribution of original data set by random perturbation matrix of united Multi-attribute generated by random perturbation matrix of single-attribute. we encode different value for each attribute of original data set to make the technology be suitable to varieties of data types. The technology selects a random perturbation matrix for each attribute to increase the degree of privacy preserving and uses the r-amplifying method to prevent privacy breach after the data converted. we adopt matrix condition number to reduce the error rate of reconstructing the distribution of original data and improve the accuracy of mining.
Keywords/Search Tags:data mining, privacy preserving, decision tree, random perturbation matrix
PDF Full Text Request
Related items