Font Size: a A A

Privacy Preserving Base On The Taxonomy Tree For Set-valued Data Publishing

Posted on:2018-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y L HuFull Text:PDF
GTID:2348330536952517Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of data technology,more and more occasions of various types of data are published;data mining and analysis technology of data mining can help quickly from large data set of valuable content,promoting the further development of data publishing technology.As a form of information storage,set-valued data(such as credit card transactions,user record,supermarket shopping in the hospital electronic records)is the main data types in the data release,but also provides a lot of information.In view of these set value type data to carry on the linear statistical data to release,will give the sensitive information which contains the data to cause the serious threat.So,how to publish these data linear statistical results at the same time protect the data in the personal sensitive information is not compromised is very important.In the linear statistical data,the attacker with strong background knowledge can deduce the personal sensitive information implied in the published data with high probability.This high confidence deductive attack is the one of important reason for the current data privacy disclosure.The value of data,the traditional methods are not anonymous well against such attacks,differential privacy protection model as a new model,the attacker does not care knowledge can fundamentally complement traditional anonymity model.The real data released,the data set is incrementally updated,so this paper takes the static and dynamic data set value set value of data as the research object,the use of differential privacy protection points on how to implement effective privacy model,improve the utilization rate of published data related research,the main works in the paper as follows.(1)Analyze the principle of differential privacy protection model to realize data privacy protection,compare and analyze the advanced privacy protection technology and data publishing technology both at home and abroad.On this basis,summarize the advantages,disadvantages and the scope of application of differential privacy protection model and data distribution framework.(2)Based on the static set value problem of privacy protection data released,by analysis the influencing factors of the classification tree method to add noise,static split set computing node structure classification tree proposed a method of using information gain availability function value of data distribution method,the data set according to the generalization,the availability function privacy budget using information gain selection splitting scheme of each node in the taxonomy tree set,and split the node calculation according to the index mechanism of feasible sub division,retain the best split node,finally add noise to the leaf nodes of the classification tree using the Laplace mechanism,guarantee the privacy of data set.(3)An algorithm of dynamic set-valued data distribution based on differential tree privacy protection is proposed,because the real set-valued data is continuous and has no boundary,and can not continue to use static set-valued data publishing privacy Protection algorithm.The algorithm of dynamic set-valued data distribution first constructs an optimal classification treeaccording to the set of all items in the data set,and selects the most closely related itemsets.Then,a boundary value is set to limit the incremental update of the data,Adding the new records to the root node of the classification tree,iteratively assigning each record according to the allocation method of the initial classification tree,and finally adding the noise to the leaf nodes using the Laplace mechanism.(4)Based on two sets of real data sets,setting different parameters for differential privacy data release experiment,the relative error analysis of the specific data released by the relative error analysis to verify the availability and effectiveness of this research method.In this paper,we study the two implementation schemes of differential privacy for static and dynamic set data release scenarios,which can effectively improve the accuracy of data dissemination on the basis of ensuring the data privacy.
Keywords/Search Tags:differential privacy, taxonomy tree, set-valued data, dynamic set-valued data, data release
PDF Full Text Request
Related items