Font Size: a A A

Research On Differential Privacy Protection Based On Clustering

Posted on:2020-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2428330575455437Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,the collection,storage,analysis and distribution of various types of data have become simple and convenient,and the speed of information dissemination has also shown a fast,multi-form,wide-ranging feature that promotes and promotes social progress and economic growth.The important power.However,once privacy information falls into the hands of illegal intruders,it can lead to immeasurable losses for businesses,governments and countries.Therefore,when data is published,analyzed,and mined,privacy processing of the data is required.As a privacy-protected data publishing technology,the differential privacy model defines a very strict attack model with maximum background knowledge.With the support of powerful background knowledge and mathematical theory,by adding to the original data set.The right amount of noise parameter s is used for data protection.By analyzing the privacy protection data mining technology(PPDM)--the traditional K-means clustering algorithm under differential privacy protection,the key point of privacy leakage in the clustering process is that it can't correctly select the appropriate cluster center point for privacy protection.Because the data set is clustered,the requirements for selecting cluster points are not high,and the data is protected only by the random position of the cluster center point,although this traditional protection technology has certain protection for data security.However,the selection of initial points and center points in the clustering process has certain randomness and limitations,which not only reduces the accuracy and usability of clustering,but also distort the noise-added results of the differential privacy model.The main content of this dissertation is based on the improvement of clustering algorithm under differential privacy protection.The purpose of this dissertation is to achieve high availability and high accuracy of clustering algorithm under the premise of satisfying differential privacy protection.In this dissertation,the following research work is carried out on the above issues:(1)Based on the clustering effect and initial center selection of K-means algorithm,an improved K-means clustering algorithm based on K-model is proposed.For the deficiency of the traditional K-means algorithm,K is used in the selection of initial points.The idea of the mode in the pattern algorithm compares the attribute value of each point with the value of the cluster center attribute,and obtains the data with the most difference of the current attribute values as the initial point.The data points other than the initial point are found by the K-means algorithm from the formula Euclidean distance to find the shortest distance from the current point to the center point of the original cluster,and the cluster group is re-obtained.Due to the algorithm improvement in the initial point and cluster distance,the algorithm improves the clustering accuracy and effect of the clustering algorithm.(2)In view of the shortcomings and shortcomings of the traditional differential privacy K-means algorithm,this dissertation proposes a K-means-based K-means clustering algorithm differential privacy protection model.The traditional differential privacy K-means algorithm adds noise to the center points of the dataset,which may cause the deviation of the center point deviation in the iterative process to increase with the increase of the number of iterations,which will lead to accurate clustering.Sex can't reach a higher level.Therefore,considering the improvement of the Laplace noise method,the specific position of the sensitive attribute in the data sample is obtained by changing the distance of the data sample from the center point to change the order of adding noise.(3)Through three sets of comparative experiments,the following three evaluation criteria were used to evaluate:clustering effect,F-measure and clustering convergence speed.The simulation experiments show that clustering effect,clustering precision and time complexity are related.Compared with the related methods proposed by the predecessors,there are obvious advantages.Figure[12]table[7]reference[52]...
Keywords/Search Tags:K-means clustering algorithm, data mining, differential privacy, K-modes algorithm, privacy protection technology
PDF Full Text Request
Related items