Font Size: a A A

The Research On Publishing Set-valued Data For Differential Privacy

Posted on:2016-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:X F HuangFull Text:PDF
GTID:2308330464462434Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, it brings convenience and shortcut that all kinds of data are released, collected, saved and analyzed in the Internet. For example, the hospital electronic case records of patients basic information, census records and so on. Set-value data, as a kind of information, is conducive to the release of data mining. But the privacy included in the data will be threatened. In order to solve this contradiction, the privacy protection study in data publishing is produced. Because of the characteristics of the set-valued data, it is difficult to use traditional means to guarantee data security. At present, for the characteristics of set-value, it is popular for tree structure and its meshing method to people. In this thesis,the research on publishing set-valued data is started from the perspective of the division method of tree structure starting, combining with differential privacy protection model. The main research contents are as follows.(1) According to the taxonomy tree partition based differential privacy method, taxonomy tree does not take the characteristics of set-valued datasets into consideration of tree construction. By analyzing the influence factors of added noise, this thesis proposes a novel method that releases set-valued data based on the characteristics of datasets. This method firstly analyzes the datasets, and then dynamically forms taxonomy tree structure according to the types of records in the dataset and the proportion between the total output of a single record field and the total number of species appeared in proportional output fields.(2) On the basis of theoretical analysis, the experiment platform that is constructed by constructing the taxonomy tree. This experiment platform is composed of original data partitioning, distribution records and adding noise. In the process of construction platform, the traditional method of publishing data based on taxonomy tree that based on number represent record cannot achieve large amount of data. To solve this problem, this thesis use the method of integer represent record that take full advantage of the characteristics of information representation. Through a bit operations to determine whether the record belongs to a partition function, this method reduces the number of iterations and improves the efficiency of the algorithm. Through experiment contrast, it founds that the method of integer represent record is better when dealing with large quantities of data.(3) Base on the theoretical analysis and the bridge of experimental platform, the experiment project is designed. The experimental results shows that the proposed method could effectively make use of the characteristics of set-valued datasets, when the datasets satisfies condition, to construct taxonomy tree and add less noise. It will reduce the distortion degree of the original data and improve the effectiveness of data availability, to make the data availability and privacy protection to achieve a better balance.
Keywords/Search Tags:taxonomy tree, differential privacy, datasets characteristics, set-valued data, data publishing
PDF Full Text Request
Related items