Font Size: a A A

Study On Privacy Preserving Data Mining Under Horizontal Distribution

Posted on:2016-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:J W ShanFull Text:PDF
GTID:2348330482957881Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining aims to discover the hidden data rules and useful knowledge of a large amount of data by using deeper and more complex analysis, modeling ways, which is widely used in scientific research and real life scenarios. The focus data mining is different-from different perspectives, for example, there are different classifications when focus on data mining methods, and data distribution. After years of development, data mining has gradually evolved from centralized Mining to distribute Mining. For example synergy between mining. companies will lead a result with more favorable to the accuracy. However, distributed mining means data sharing, but many times, companies do not want to let each other know their sensitive information under distributed mining, because these data is the core competitiveness of enterprises. The traditional data mining has a direct effect on the data set, so how to protect sensitive information in data mining has become an important issue of research and commercial areas. Privacy Data mining is bound to be a hot area for the future.In this paper, for privacy preserving data mining under horizontal distribution problems, including clustering, classification and association mining, we design algorithms to protect privacy issues under mining process and the experiments have been carried out to verify the algorithm. The specific research work are as follows:(1) For the protection of sensitive information under the clustering mining, designed a data privacy protection clustering mining solution under horizontal distribution, use the stable classical clustering K-Means algorithm to compute the clustering result of the sample data on each local site at first. By using SMC technology, we can get initialized global cluster centers. Second, in the course of each round of iteration, each site assign the appropriate local data to every group in accordance with the global cluster center. At the end of each round of iteration, re-use SMC technique to determine the new cluster centers, which will not related to the site specific data values. Finally, after several rounds of iterations when the cluster centers do not change, the process is over.(2) Forthe protection of sensitive information under the classification mining, proposed a data privacy protection classification mining solution under horizontal distribution, first of all, initialize the network weights which were randomly selected from one of the sites as given initial value. Secondly, use BP neural network to calculate the result of each sites by using input sample data. Each round of computing will produce incremental value of the Weights. By using homomorphic encryption technology for each site as well as the weights increment, thus ensuring the independence of the site data, privacy is not leaked, the specific use of homomorphic encryption technology is Paillier cryptosystem. Finally, do several rounds of iterations when the error precision meet the requirement.(3) For privacy issues with associated mining, design a algorithm to protect the privacy under the horizontal distribution of a data. This algorithm uses an improved Apriori algorithm based on partition to find the candidate set, to calculate the count and candidate set by a third party site using homomorphic encryption. After decryption, according to the candidate set drawn frequent k item sets. According to the above process conditions determine whether do the loop, use the sites to find out k+1 candidate set. Cause support counts are key information for each site and in the course of the calculation are well protected, thus ensuring each sites privacy.
Keywords/Search Tags:data mining, k-Means BP neural network, homomorphy encryption
PDF Full Text Request
Related items