Font Size: a A A

Research On HADPK-means++ Clustering Algorithm Supporting Differential Privacy Protection

Posted on:2022-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:F G XuFull Text:PDF
GTID:2518306326998829Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology,the massive data generated by various electronic devices contains huge value information,such as patient diagnosis information of medical institutions,and customer information of banking institutions.As a typical unsupervised data mining method,cluster analysis can mine unknown knowledge and potential value from massive data.However,while mining useful information,personal privacy information may be leaked.In order to solve this problem,privacy protection technology came into being.Differential privacy,as a new and strictly mathematically proven data distortion technology,has been widely studied in recent years because it does not need to pay attention to the background knowledge of attackers.Introducing differential privacy noise into the clustering analysis process can well protect sensitive data and prevent privacy leakage.However,noise disturbance will reduce the availability of clustering results.Therefore,how to improve the availability of clustering algorithms while protecting sensitive data is a problem that still needs to be solved.The differentially private k-means clustering algorithm satisfies the differential privacy by publishing the estimated value of the cluster center in each iteration,effectively avoiding privacy leakage.However,disturbing the cluster center will bring random errors,which deviates from the true cluster center,resulting in a poor clustering effect finally.In response to this problem,the paper proposes a highly available differential privacy k-means++ clustering algorithm(HADPK-means++),which supports differentially private protection.Firstly,for the sensitive problem of the initial center selection,an initial center selection algorithm based on reverse order is proposed,which makes the initial center selection more stable and accurate.Secondly,for the influence of random errors caused by noise disturbance,a similarity measurement method based on the similarity between inter-cluster and intra-cluster is proposed to make the division of clusters more accurate.Finally,based on the transformation invariance of differential privacy,a cluster center correction mechanism that solves the problem of cluster center deviation is proposed to prevent the extreme situation of all samples being divided into the same cluster.Comparative experiments show that the clustering availability of the proposed algorithm is higher than the existing differentially private k-means clustering algorithms at the same level of privacy protection.The recommendation system makes personalized recommendation according to user behavior data.However,with a large number of users,not only does it take too long to search for the nearest neighbors,but also the use of data may reveal user privacy information.In response to this problem,this paper applies the HADPK-means++algorithm to the collaborative filtering recommendation.Firstly,the HADPK-means++algorithm is executed on the score matrix to protect user privacy,and then the nearest neighbors are found in the same cluster,the search range is narrowed,and the predicted score is finally generated and users are recommended.The experimental results show that the collaborative filtering recommendation based on HADPK-means++ can achieve a balance between recommendation accuracy and privacy.It also shows that the HADPK-means++ clustering algorithm in this paper can also be applied to the other data mining cases to protect user privacy information.
Keywords/Search Tags:K-means, Clustering, Differential Privacy, Privacy Protection, Collaborative Filtering
PDF Full Text Request
Related items