Font Size: a A A

Research On Clustering Algorithm Based On Differential Privacy Protection

Posted on:2019-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2428330578972832Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Big data has an important application in all walks of life.In the era of data king,mastering the data is to master the weight of winning.All kinds of enterprises pay more and more attention to the function of data.At ordinary times,the seemingly inconspicuous data information,after the analysis of data mining,will find some important and valuable.Therefore,the next step of operation and management will play a guiding role in forecasting.This is the meaning of data mining.However,all the data in the final analysis are human data,so it is necessary to cover the personal data while mining the potential association of these data,so how to ensure the privacy of the data and prevent personal information leakage is an important problem.In the face of this phenomenon,how to protect the user data in the process of information mining has become a major research direction for the present privacy protection.In many privacy protection methods,the difference privacy can be measured by the mathematical basis and the privacy level.The combination of data mining can effectively ensure that the data will not be excavated.And divulge privacy.The data mining algorithm based on differential privacy protection is studied and discussed in the following aspects:(1)An R-neighborhood distance outlier algorithm is proposed.The outliers are detected by the distance ratio,and then the data set is divided into several parts,which is beneficial to the selection of the initial center points of the DP K-means algorithm in the post.Experiments show that the outlier algorithm has great time advantages in ensuring effective detection of outliers,and is suitable for application and clustering algorithms.(2)An improved algorithm of outlier elimination DP-K-means(DP-ODK-means)is proposed.K-means based on differential privacy needs to improve data privacy while ensuring the availability of results.The algorithm optimizes the randomness of the initial cluster center selection.According to the improved distance based outlier detection method,the initial cluster center is selected according to the sub set of the density division.The cluster efficiency is increased and the Laplace noise is added to the original data to protect the original data.Experiments show that this method satisfies the differential privacy requirement and preserves the availability of data clustering.(3)A DP-MCDBScan algorithm based on differential privacy is proposed.The DP-DBScan method,which combines the differential privacy technology,can effectively solve the information security problem in the data set clustering process,and can effectively deal with the data set with certain noise.The DP-MCDBScan algorithm is an improved algorithm for the DP-DBScan algorithm.By optimizing the method of selecting the core points,the clustering accuracy is improved when the privacy protection budget is low,while the time cost is reduced,and the impact of the initial random selection on the clustering is reduced.
Keywords/Search Tags:Differential privacy protection, Data mining, DP-MCDBScan, Outlier elimination, availability
PDF Full Text Request
Related items