Font Size: a A A

Research On Fuzzy C Means Clustering Algorithm Based On Differential Privacy Protection

Posted on:2021-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:J HanFull Text:PDF
GTID:2558307109475994Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the user data of all walks of life is growing exponentially and the database scale is expanding day by day.Data mining technology can extract patterns and knowledge that can be understood by people from the massive data information,so it has been widely concerned by people.However,massive data hides a lot of personal privacy.If the data is randomly collected and released,it will inevitably expose personal privacy.Therefore,protecting individual privacy in the process of data mining has become a problem worthy of research in the field of data security.The fuzzy C means clustering algorithm is one of the typical fuzzy clustering algorithms in data mining algorithms.The algorithm has the risk of user privacy being leaked during the iteration process.The fuzzy C means clustering of differential privacy protection can protect the privacy of individual users while mining data rules;however,the problem of usability degradation due to data disturbance is a common problem of such algorithms.In this paper,the research work is focused on the availability of clustering results and the efficiency of the fuzzy C means clustering algorithm based on differential privacy protection.The research contents are as follows:1)In view of the low availability of clustering results in the fuzzy C means algorithm based on differential privacy,this paper proposes an improved algorithm IDPFCM(Improved Differential Privacy Fuzzy C means Clustering Algorithm).First,a privacy budget allocation method based on Gaussian kernel function is designed.In each iteration,the privacy budget allocation ratio is calculated according to the Gaussian weight of each cluster center point;then the privacy budget allocation method based on Gaussian kernel function is applied to fuzzy C means algorithm,according to the Gaussian weights,different noise is added to each cluster center point,which realizes that the algorithm has higher cluster availability under the premise of high intensity privacy protection;Finally,experiments are conducted on public datasets and synthetic datasets.Experimental results show that under the same privacy protection intensity,the clustering accuracy of the proposed algorithm is higher than that of other algorithms,and the number of iterations of the algorithm is reduced.2)Aiming at the problem of the high running time of the IDPFCM algorithm in the big data environment,the MR-IDPFCM algorithm is designed and implemented.The parallel algorithm used the MapReduce computing framework,assigns Map tasks to perform membership updated and center point Gaussian value calculations,assigns Reduce tasks to update cluster center points and accumulates Gaussian values,calculates Gaussian weights in the main function,and uses the Laplace mechanism to achieve the difference privacy protection,iterative execution of control algorithms.By conducting experiments on different datasets,the experimental results show that the MR-IDPFCM algorithm has a great advantage in operating efficiency.
Keywords/Search Tags:data mining, fuzzy C means clustering, differential privacy protection, Gaussian kernel function, MapReduce
PDF Full Text Request
Related items