Font Size: a A A

Research On Data Clustering With Differential Privacy Protection

Posted on:2020-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:C HaiFull Text:PDF
GTID:2428330575963118Subject:Engineering
Abstract/Summary:PDF Full Text Request
Clustering algorithm is an important algorithm in data mining and plays an important role in many application fields.Especially in the QoS prediction framework,the service clustering times is often used to measure the similarity between users;in the classification data visualization,clustering is used to find the appropriate aggregation points in the classification dataset.With the rapid development of Internet technology,data with sensitive information is growing rapidly,which brings great risks to the privacy of users.The privacy protection issues related to the applications of clustering algorithms have attracted much attention in the academic community.Differential privacy is a privacy protection method that has emerged in recent years.By adding a small amount of noise,the privacy of individual information is guaranteed,as well as the availability of data.The number of noise disturbance service clusters is added to the service quality prediction,This interferes appropriately with the selection of similar users,and avoids direct leakage of the user's individual preferences.In the process of classification data visualization,the iteration center points of noise disturbance clustering are added by using differential privacy mechanism to achieve the privacy protection of classified data.In view of clustering algorithm,this thesis applies the differential privacy protection method to the process of service quality prediction and classification data visualization,Privacy analysis and a lot of show that,the new proposed scheme protects data privacy.Can effectively guarantee the accuracy of prediction,and generate a safe view.The main work and contributions include the following two aspects:(1)QoS prediction algorithm based on differential privacy protection is proposed.The main tools of this algorithm include exponential mechanism and improved coverage clustering.Firstly,through the improved coverage algorithm,the service clustering number is calculated and normalized as the similarity between users;then the clustering number is used to define the utility function,then two index mechanisms are designed to select similar users for the target users,One selection based on the utility of individual users.The other is determined by combining multiple users according to their total utility.Finally,based on the similar users,the default QoS values of the target users is predicted.This thesis not only proves that the algorithm satisfies ?-differential privacy,but also conducts experiments on the real data set,WS-Dream.The results show that compared with the existing technique using Pearson correlation coefficient as utility,this scheme protects data privacy and significantly improves the accuracy of the forecast.(2)A classification data visualization algorithm that satisfies differential privacy protection is proposed.The main research tools include Laplace mechanism and k-modes clustering algorithm.Firstly,an improved privacy-protected k-modes algorithm(Improve Differential Privacy k-modes,IDP k-modes)is proposed.The data sets is divided into k subsets,and the attribute mode of each subset is counted as the center point,replacing In the traditional k-modes algorithm,k center points are randomly selected.Through experimental analysis,IDP k-modes algorithm improves the accuracy and stability of classification data clustering compared with the existing DP k-modes.Then,in order to solve the serious superposition of visual images due to dense data classification,and visualization in the process of privacy leakage,this thesis proposes the differential privacy equipartition k-modes(DPE k-modes)based on the IDP k-modes algorithm.During each iteration,the number of data points in the k clusters remains.Finally,the center point in the k clusters is assigned as an aggregation point,and the parallel coordinate method is used to obtain a visualized image.By the privacy analysis,this thesis proves that DPE k-modes satisfy differential privacy.Experiments on the real classification data Breast Cancer verify that DPE k-modes algorithm maintains good aggregation quality for large k,and outputs secure images,which keep high availability,the distribution,correlation and some other characteristics of the original images.
Keywords/Search Tags:Covering Algorithm, Differential Privacy, k-modes, QoS prediction, Data Visualization
PDF Full Text Request
Related items