Font Size: a A A

Research On Data Swapping Methods Based On Data Mining Privacy Protection

Posted on:2013-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:N GeFull Text:PDF
GTID:2248330371997599Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the quick development of technology, data intruders steal personal privacy more and more easily. Mostly there is still large disclosure risk, even when the identities of the individuals are not present in the data. Because privacy disclosure events occurred frequently, there are growing concerns about invasions to privacy of personal information from data mining. Related laws and media’s concern makes data owners to be under great pressure when they release their data sets. If data owners can’t protect personal privacy when release data, they will face plenty of problems such as legal sanctions, honor loss, etc. Therefore, we think that it will be more interesting to research on data mining privacy protection from data owners’ standpoint.To data mining privacy protection, this paper does research from two sides, including data and knowledge protection. This paper is from data owners’ standpoint. The basic idea of this paper is maximizing data utility or knowledge. Using data swapping, we can reduce personal privacy disclosure risk effectively.At first, this paper does research on privacy protection from data protection. After studying an existed data mining privacy protection method which we call it EDP algorithm, we propose an improved algorithm which we call it improved EDP algorithm. The proposed algorithm reduces the time complexity degree significantly, especially in the case of the complete binary tree with memory of which worst-case time complexity is of order O(MlogM), where M is the number of internal nodes of the complete tree. The experiment results show that the proposed algorithm is feasible and more efficient especially in the case of large and more complex tree structure with more internal nodes, etc. From a practical point of view, the improved EDP algorithm is more applicable and easy to implement. At present, there are three problems in data mining privacy protection. The first one is that it can’t deal with mixed numeric and categorical data. The second one is that it can’t apply to SDB privacy protection. The third one is that it can provide the best balance between data utility and privacy protection. To solve these three problems, we proposed a new data mining privacy protection method. This method is based on the idea of balancing data utility and privacy protection. The key components of this method include chi-square-based measures to evaluate disclosure risks of individual records, an optimal pruning algorithm to identify high-risk records, and classify or random data swapping procedures to reduce the disclosure risks. The proposed method provides the best trade-off between data utility and privacy protection against privacy disclosure in data mining, while preserving the statistical properties of the data set and maintaining the relationships among attributes as far as possible. An experimental study on five UCI data sets shows that the proposed method is very effective for protecting privacy in data mining.Then, this paper does research on privacy protection from knowledge protection. At present, there are many researches on knowledge protection, but lack of researches on data protection which is personal privacy protection in knowledge protection. We proposed a method to protect personal privacy protection in knowledge protection effectively. The basic idea of the proposed method is maximizing data sets’ accuracy. We introduce data swapping method. The proposed method can not only protect sensitive information but also reduce the personal privacy disclosure risk effectively.
Keywords/Search Tags:data mining, privacy protection, data swapping, data utility, accuracy
PDF Full Text Request
Related items