Font Size: a A A

Research On Distributed Data Mining Based On Privacy Preserving

Posted on:2019-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:F SuFull Text:PDF
GTID:2348330545455704Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the decentralized storage of data has become the main trend.How to mine the data distributed on different sites and get the global knowledge has become a hot issue in the field of data mining.Distributed data mining provides the method to solve the problem,but it has the following problems:how to apply the centralized data mining algorithms to the distributed system and how to protect the data privacy of each site.The above problems are studied in this thesis.Firstly,the basic concept,the mining process and the main techniques of data mining are introduced.The common methods of clustering mining are analyzed.For the K-means clustering algorithm,the number of clusters and the initial cluster centers have great influence on the clustering results.Therefore,the Cell-based K-means clustering algorithm is proposed.The improved algorithm can not only overcome the shortcoming that the number of clusters needs to be determined in advance,but also effectively calculate the initial cluster centers close to the true cluster centers.Secondly,the data partitioning and the system characteristics of the distributed data mining are researched.Based on the research,secure multi-party computation is used as the privacy preserving technology in the distributed environment.The concept,models and protocols of secure multi-party computation are studied.At last,by analyzing the characteristics of the traditional distributed K-means clustering algorithm,a privacy preserving distributed clustering algorithm is proposed.This algorithm combines the improved K-means clustering algorithm and secure multi-party computation.This algorithm implements collaborative data mining based on privacy preserving in the distributed system.The experimental results show that the proposed privacy preserving distributed clustering algorithm can not only get the stable and accurate global clustering results,but also protect the original data and the intermediate calculation results of each distributed site.
Keywords/Search Tags:distributed data mining, cell, privacy preserving, secure multi-party computation
PDF Full Text Request
Related items