| In the era of big data,how to obtain valuable information from massive data has become the key for enterprises to win the competition.Clustering is a common technology in big data analysis,and K-prototype is a classic clustering algorithm for mixed data.Its principle is simple and the effect is efficient,so it has been widely used in many fields.However,the data used by the K-prototype algorithm usually contains sensitive information.If these data are used maliciously,it will seriously threaten the privacy of users.In order to protect the privacy of users in clustering data,the privacy protection model based on trusted third party is mainly used at present,but it is difficult to find an absolutely reliable third party in reality.Local Differential Privacy(LDP)is a new privacy protection technology developed in recent years.It achieves decentralized privacy protection by perturbing data on the user side.Although local differential privacy can effectively deal with the problem of third-party privacy disclosure,when it is directly applied to cluster analysis data privacy protection,the noise interference will be amplified due to the existence of cluster centroid update process,which will affect the clustering quality.In response to the above problems,this paper studies how to ensure the quality of clustering results on the premise of local differential privacy protection,and its application in the scenario of collaborative filtering recommendation based on clustering,mainly in the following two aspects:(1)Aiming at the problem that the traditional clustering data privacy protection model relies on trusted third parties,a local differential privacy-based K-prototype clustering method(LDPK)is studied and proposed.The method first uses local differential privacy to perturb the data on the user side and send it to the server side,and then completes the clustering iteratively through the interaction between the server and the user.On the premise of decentralized privacy protection,LDPK avoids the problem that clustering users directly according to disturbed data will further amplify the impact of noise,and ensures the quality of clustering results.(2)Aiming at the problem of leaking user data in the collaborative filtering recommendation mechanism based on clustering,a collaborative filtering recommendation model based on LDPK mechanism is proposed.Users first complete clustering with the server through the LDPK mechanism,then the server assigns a recommendation server to each cluster,and finally the recommendation server completes the data collection and result recommendation of the corresponding users.In this model,user data is stored in different recommendation servers according to the clustering results,which eliminates the risk of the server leaking all user data in the traditional recommendation model,and improves the privacy and security of user data. |