Research On Privacy Preserving Data Mining

Posted on:2009-08-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W J Yang

Full Text:PDF

GTID:1118360275954681

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The recent development of networking and storage technologies make it more and moreconvenient to collect, process or publish large volumes of data which also contains greatamount of personal privacy, business secrets and classified information. When the data isobtained, especially during the mining process, most of it can be used without any restriction.As a result, once the sensitive part is disclosed, it will seriously invade our privacy, disturbour normal life or even threaten the security of our society. Data mining, as one of the mostpowerful technology for knowledge discovery, reveals to us the hidden information and datapatterns from the normal data. Although it brings us knowledge and profits, there are severeproblems in its way of dealing with data. The concerns over data privacy increase extremelysince anyone accessible to the mining process can obtain the original data records, whichfurther leads to a high risk of data misuse.Therefore, in the recent years, a number of techniques have been proposed to solve theseproblems. In our research, we aim at providing a privacy preserving way of data mining bytransforming the original data sets before the mining process. We've also developed severalnovel transformation techniques, so that we can still get accurate mining results while theprivacy is well protected. We conclude our main contributions as following:1. We've proposed the essence of data privacy and two strategies for protection. In ourresearch, we analyzed most of the current privacy preserving methods, in which thestructure of the privacy objects are discussed in detail. We found that few of theirdefinitions can accurately describe the essence of data privacy, which makes it difficultfor the corresponding methods to provide a comprehensive protection. Based on thisunderstanding, we redefined data privacy by using data associations which are muchmore close to the actual concept of privacy in our normal life. We also proposed twokinds of strategies to protect the new privacy. Also, at the beginning of the thesis, weintroduced in detail the background knowledge of privacy protection and its field ofapplication.2. We've proposed a novel method of randomized anonymization to decompose the dataprivacy. Moreover, we've also proposed a mechanism to compromise between the level of accuracy and privacy, so that the threats from the priori knowledge are elimi-nated. In the scenario of data publishing, we proposed a method of data randomizationby applying our first strategy. It randomly replaces the data in each record by usingthe distribution of the original data. By comparing with the famous k-anonymizationtechniques, our method not only offers a much higher level of privacy protection, butalso maintains the useful knowledge in the original data set. Furthermore, the usermay use his priori knowledge to infer the sensitive information which he is not al-lowed to know. We also developed a method to counteract the threats from these kindsof knowledge in the problem of data publishing. While the method brings more un-certainties on the inference of original values, it also provides a mechanism to balancebetween the privacy and accuracy.3. We've proposed protocols of data transmission and data integration to transform dataprivacy, so that the threats from malicious adversaries are counteracted. Moreover,we've also implemented customized privacy. By applying the second strategy, we pre-sented an efficient clustering method for distributed multi-party data sets using theorthogonal transformation and perturbation techniques. The miner, while receivingthe perturbed data, can still obtain accurate clustering results. This method protectsdata privacy not only in the semi-honest situation, but also in the presence of collu-sion. Moreover, each attribute in a data set usually involves a certain level of privacyconcerns. It is necessary to provide the data owner with a mechanism to customize theperturbation of his own data. We implemented the customized privacy, so that eachvariable in the data set can be perturbed according to its own importance which isspecified by the owner.4. We've proposed an extendible privacy preserving method which adapts to differentnumber of participants. Moreover, we've also proposed a method to generate an inde-pendent perturbation. One of the main technical challenges for privacy preserving datamining is to make its algorithms adaptable to participants while still keeping the pri-vacy and accuracy guarantees. We analyzed the in?uence on the accuracy and privacyprotection when the participants increase in the normal method. And we also pro-posed an improved method to solve the problem with a large number of participants.Moreover, we also proved the importance of independent perturbation, and proposeda method adaptive to large data dimensions.

Keywords/Search Tags:

Data mining, Privacy preserving, Randomization, Anonymization, Knowledge preserving, Priori knowledge, Orthogonal transformation, Customized privacy, Scalability

PDF Full Text Request

Related items

1	Privacy Preserving Association Rules Mining Algorithm Based On Random Orthogonal Transformation
2	Anonymization-based Research On Privacy Preserving Data Publishing In ERP Systems
3	Privacy Preserving Support Vectormachine Based On Orthogonal Transformation And Secure Dot-Products
4	Research On Privacy Preserving Data Mining And Knowledge Discovery
5	Research On Privacy Preserving Methods For Data Mining
6	Privacy-preserving data mining through data publishing and knowledge model sharing
7	Research On Anonymization Based Privacy Preserving Method On Geosocial Networks
8	Research On Privacy Preserving Association Rules Mining
9	A Study Of Privacy Preserving Anonymization Techniques In Combinatorial Maps
10	Privacy Preserving In Association Rule Mining