Font Size: a A A

Privacy Preserving Research For Multiple Sensitive Attribute Data Publishing

Posted on:2019-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2428330572955593Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of Internet technology has promoted the development of data mining,artificial intelligence and other technologies,and deepened the depth and breadth of these technologies used in daily life,which provided endless convenience for daily production and life.The popularization of these applications makes the data collection and sharing work more convenient.The shared data has been put into the new research work,which has promoted the progress of technology and finally formed a complete system of data ecological chain.Over the past two years,with the increasing number of data sharing and interaction activities,privacy information leakage and other issues occur frequently.How to ensure the security and excavating value after data release.Aiming at this problem,various researches have been carried out among the academic circles in recent years.After introducing the development history,basic knowledge,research progress,classic models and algorithms of privacy protection technology,this paper proposes a multi sensitive data privacy publishing algorithm and three improved clustering algorithms.The specific work of this paper is as follows.First,the data publishing algorithm based on clustering technology can minimize the loss of information after the release of data,so this technology has been widely used in the research of privacy protection algorithms in recent years.This paper introduces some algorithms of multi sensitive attribute privacy protection based on Clustering Technology in recent years,and points out that the selection of clustering algorithms will seriously affect the accuracy of privacy preserving algorithms.At present,the anonymous algorithm based on clustering technology usually uses the k-means algorithm and its improved algorithm in clustering parts while these algorithms are all easy to fall into local optimum and rely heavily on the selection of k.In order to solve this problem,after introducing the k-mean algorithm and its improved algorithm in detail,this paper proposes three improved algorithms based on the bisecting k-mean algorithm,the centroid redivision bisecting k-means algorithm(CRBK),the record redivision bisecting k-means algorithm(RRBK)and the self-adaption record redivision bisecting k-means algorithm(SRRBK).The CRBK algorithm and RRBK algorithm solve the problem that the bisecting k-mean algorithm is easy to fall into the local optimal problem by replanning the global centroids and records after two sub operations.SRRBK algorithm solves the bisecting k-mean algorithm's dependence on k by introducing the concept of information loss convergence and cluster convergence.At the same time,the three algorithms are simulated.The experimental results show that the improved algorithm achieves good accuracy in clustering accuracy,stability and information loss.Secondly,in view of the fact that the current multi sensitive attribute privacy publishing model can not solve the problem of data release under the large diversity of sensitive attribute sets,the(l,x,w)diversity model is proposed.The concept of information entropy is introduced into the model,and the security protection of sensitive attributes is achieved by restricting the diversity and uniformity of the equivalent groups on sensitive attributes.At the same time,in order to solve the problem that the lossy connection is easy to destroy the data mining after the release,this paper proposes an entropy based Entropy based l-Diversity Clustering(EBLC)algorithm using the lossless connection publishing strategy.The algorithm is based on clustering technology and clustering according to non sensitive attributes.In the same cluster,the equivalent group that satisfies the(l,x,w)diversity is generated according to its sensitive attribute,and all the equivalent groups are generalized to get the release data.At the same time,the simulation experiment of EBLC algorithm is carried out.The algorithm is verified from information loss,operation efficiency and anti attack performance respectively,and the experimental results show that the algorithm has good performance in information loss and data mining.
Keywords/Search Tags:l-diversity, privacy protection, multiple dimensional sensitive attribute, clustering, data security
PDF Full Text Request
Related items