Font Size: a A A

Research On The Improved Method Of M-invariance Algorithm In Continuous Data Publishing

Posted on:2016-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:M WangFull Text:PDF
GTID:2298330467998798Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the time of big data is coming, an increasing number of data can be mined outto reveal some potential information about the user from the databases published bythe government or companies. However, data in its original form contains sensitiveinformation about individuals, and publishing such data will violate individual privacy.Privacy-preserving data publishing is used to publish useful information whilepreserving data privacy.On one hand, PPDP technologies deal with the data by anonymous operations toprevent adversaries from inferring individuals’ sensitive information from thepublished tables and achieve the goal of privacy preservation. On the other hand, theanonymized data will be used for data mining, PPDP need to maximize the utility ofdata to guarantee that the data analysis based on the anonymized data has highaccuracy. So it’s important for PPDP to find the trade-off between privacy and utility.Currently, most PPDP technologies consider about one-time data publication. Inreality, however, data updates rapidly, which leads to continuous data publication. Ifadversaries link the tables published before together, they might infer the sensitiveinformation of target individual, so that the privacy-preserving data publicationproblem occurs.In this paper, we do research on classic algorithms m-invariance and Slicing forthe continuous data publication problem. m-invariance algorithm uses generalizationtechnique to modify the original table, which will result in data loss unavoidably,especially when handling multidimensional data, the information loss rate is greater.Besides, m-invariance is inadequate in terms of privacy protection. Slicing is anadvanced data anonymous technology, which can protect individual privacy byde-associate the relationship between a quasi-identifier and a sensitive attribute whilepreserving the original information of data. In this paper, we consider the limitation ofm-invariance and combine the advantage of Slicing to present a novel algorithmm-slicing. The details of the work are as follows.Firstly, partition the attributes in the original table according to the attributes correlation. It preserves the correlation of attributes which is good for data mining,and breaks the connection between the irrelevant attributes to prevent privacy attack.Besides, it reduces the dimensionality of the data which can handle high-dimensionaldata. Secondly, partition the QI attribute in the sensitive group by its domain value, sothat we can get more clear results in the data mining tasks. Thirdly, partition the tuplesinto QI groups. Each group should obey m-invariance principle, which can effectivelylimit the risk of privacy disclosure in re-publication. Lastly, permutate each attributegroups of every QI group so as to prevent the attribute linkage and table linkageattack.This paper uses real data sets for experiments and data mining models toexamine the utility of the published data in the table. Compared with the originalalgorithm, m-slicing greatly reduces the probability of privacy disclosure, has strongerprivacy protection, and improves the utility of anonymized data. Especially whendealing with the multidimensional data in reality, m-slicing algorithm’s advantage ismore obvious. In conclusion, m-slicing is superior to m-invariance, taking intoaccount both privacy protection and data utility.
Keywords/Search Tags:Data Publishing, Privacy Preservation, Data Anonymization, m-invarianceAlgorithm, Slicing Algorithm
PDF Full Text Request
Related items