Font Size: a A A

Research On Clustering Optimization Method Supporting Differential Privacy Protection

Posted on:2019-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:J RenFull Text:PDF
GTID:2428330575973633Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet and big data has prompted the rapid growth of data in an exponential manner.These data are not only from a wide range of sources,but also from a variety of types.Data analysts usually use the method of data mining to obtain useful information.As an important branch of data mining,clustering analysis is often used to analyze and calculate data.However,as people pay more attention to the private information in the data,data analysis also brings about privacy protection issues that urgently need to be solved.Differential privacy mechanism is an emerging privacy protection technology.It's often used as an effective way to protect data privacy in the process of clustering analysis.At the same time,how to balance the additional noise and the availability of clustering results during the process of protection has become a hot issue worth studying at the moment.Therefore,for the above problems,the main research work of this thesis is shown as follows:(1)In view of the privacy leaks problem that easily occurs in the process of K-means clustering algorithm,existing studies propose to use the differential privacy mechanism to ensure the security of privacy information in the clustering process.However,the algorithm has the problem that low availability of clustering results due to additional noise.Therefore,this thesis improves the algorithm.While guaranteeing the process security of the K-means algorithm,starting from the detection of outliers and distance calculation methods to enhance the availability of clustering results based on differential privacy mechanism.It describes the working principle and the specific structure of the proposed algorithm in detail.Afterwards,the feasibility of the algorithm is verified through experiments,which can improve the availability of clustering results under the premise of privacy information security.(2)For the single K-means clustering algorithm,which faces the low efificiency of large-scale datasets and the same security problems of clustering datasets under the MapReduce framework,we choose to use differential privacy protection mechanism based on the first point.We propose a differential privacy K-means algorithm based on the MapReduce framework,which combines the properties of parallel combination and serial combination in differential privacy mechanism.And it ensure the privacy of the private information through reasonable settings.Then we analyze the security of the algorithm and verify it through experiments,which can effectively solve the privacy leakage problem and improve the efficiency in distributed clustering algorithms.
Keywords/Search Tags:Privacy Leakage, Clustering Analysis, K-means Algorithm, Differential Privacy Mechanism, MapReduce Framework
PDF Full Text Request
Related items