| With the development of internet technology,users not only request access to information on the internet,but also generate a large amount of behavioral data that is recorded by systems.We are currently in the era of big data,and how to make reasonable use of this user historical behavioral data to discover its potential value is a hot research topic.Cluster analysis is one of the important methods in user behavior analysis,often used to discover clusters of users with similar interests.However,when using cluster analysis to mine user behavior data,there is a possibility of privacy information leakage.The survey found that some users expressed reluctance to authorise personal information data to the system,however,some of them indicated that they would be willing to share information to the system for collection if the system could protect the information better.Thus,how to protect user sensitive information while uncovering its inherent value is a problem worth researching and discussing.In this context,this article will undertake the following tasks:(1)In response to the privacy leakage issue in traditional clustering algorithms during the iteration of updating cluster centers,a new differential privacy clustering algorithm is proposed by summarizing existing differential privacy-based clustering methods and analyzing their advantages and disadvantages.(2)To address the issue of poor clustering performance of differential privacy-based clustering algorithms,optimization is carried out in three aspects:the selection of initial points,the allocation of privacy budget,and the calculation of new cluster centers.Firstly,an improved cluster center correction algorithm based on the roulette wheel selection method is proposed to ensure that the distance between the initial cluster centers of K-means algorithm is as far as possible,thereby reducing the number of iterations and improving the stability of the clustering algorithm.Secondly,a uniform allocation algorithm based on the minimum privacy budget is proposed to enhance the clustering performance of differential privacy clustering algorithms.Finally,because the data set in this paper is normalized to the interval,after adding perturbation noise to the sample points,the sample points may deviate from the interval,which affects the results of the algorithm.Therefore,a noise correction algorithm based on the characteristics of differential privacy is proposed.(3)To verify the feasibility and effectiveness of the algorithm,the highly available HAPDPK-means++ algorithm based on the equal difference assignment method was compared with the DPK-means++_dic algorithm(a differentially private clustering algorithm based on binary partitioning)and the DPK-means++_ss algorithm(a differentially private clustering algorithm based on a series sum)through experiments.The results showed that the improved algorithm significantly improved clustering performance.Finally,by introducing the HAPDPK-means++algorithm,a privacy-preserving recommendation algorithm was implemented.The experimental results showed that the algorithm not only protects user privacy information but also improves the accuracy of recommendations to a certain extent. |