Font Size: a A A

Research On Multidimensional Sensitive Dataset Anonymity Approach Based On Probabilistic Graph

Posted on:2018-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y M GuoFull Text:PDF
GTID:2348330542958184Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the arrival of age of big data,the information data has became the most valuable product in twenty-first century,and its output has never stopped,or even a blowout growth.With the rapid development and the wide application of data mining technology,these large amounts of data from the political,economic,cultural,educational and medical fields,were released for commercial decision-making,public opinion analysis and prediction of disease,which show very high social and economic value.These data contain a lot of personal sensitive information,so anonymizing these data plays a very important role before publishing,but it's not to hide the entire data set,just breaking the relationship between individuals and sensitive information,so that the attacker can't infer the individual privacy information.Especially for multidimensional sensitive attribute dataset(MSA-Dataset),the correlation among these attributes may reveal some hidden information.In addition with the population structure of most of the world countries has tended to aging structure,as a result the demands for medical services has become extremely urgent with a huge amount of medical data output,and in most case the structure characteristics of the medical data are multidimensional sensitive attribute.Therefore,in the age of big medical data,due to its huge social and economic benefits,scholars at home and abroad have paid more and more attention to the privacy protection of multidimensional sensitive attribute data.However the existing anonymous privacy protection methods are just simple extension based on the privacy protection method with one sensitive attribute,which might lead to the results that data utility is too low or data security can't be guaranteed when protecting the individual sensitive attributes information.So as to satisfy the demand of publishing multidimensional sensitive attribute data,we propose a novel a probabilistic multipartite graph privacy protection method based on multidimensional sensitive attribute data in this paper.The main research work is as follows:First of all,we analyze the problem of existing privacy protection model of multidimensional sensitive attribute data in the paper,which points out current methods mainly put the focus on improve the model based on single sensitive attribute data,but seldomly consider characteristics of multidimensional sensitive data.So this work we focus on the utility of anonymized data to reduce the sensitive attribute information loss.Secondly,according to the characteristics of multidimensional sensitive attribute data we propose a novel method to express it so as to reduce the data redundancy,which is showed by multipartite graph,and the user node is represented with quasi identifier attribute labels,and the relationship between multidimensional attributes is described by edges.We change the data structure from multidimensional sensitive attribute data table to multipartite graph to achieve the initial anonymity,through analyzing the multipartite graph we give the measurement methods of information loss and privacy risk.Then,combining the multidimensional sensitive attribute data and multipartite graph we design a multipartite graph privacy protection model based on probabilistic graphical model,by clustering and grouping to reduce the loss of information and by adding the probabilistic edge to express correlation degree of attributes,so that we can keep the correlation between sensitive attributes as far as possible while meeting the demand of privacy preserving for publishing data,which makes the analyzing results more accurate and greatly improves the utility of anonymous data.Finally,the design of the system and the detailed implementation process of the anonymous algorithm are given in this paper.We evaluate our method in two aspects security and utility on real dataset,and the experimental results show that our method can ensure the data security and retain the association rules between attributes to improve the utility of data.
Keywords/Search Tags:data publishing, MSA-Dataset, Multipartite Graphs, probabilistic graphs, privacy preserving
PDF Full Text Request
Related items