Font Size: a A A

Models And Methods For Privacy-Preserving Data Publishing

Posted on:2016-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Z HuangFull Text:PDF
GTID:1228330470455943Subject:Information security
Abstract/Summary:PDF Full Text Request
The era of big data brings explosive growth of data. Data security, especially data privacy becomes particularly important. Privacy-preserving data publishing (PPDP) is a hot issue in the field of privacy protection. In the process of data publishing, if the original data are released, it will bring out serious leakage of sensitive information. The published data need not only to protect data privacy information, but also to preserve data usability.Based on the summary of the existing research results, we start from the protection of linkages between the record owners and the sensitive attributes in raw data. Three privacy protection models are set up and they provide an improved PPDP service. The main contents of this thesis are as follows:(1) We introduce existing research results on the field of PPDP. Firstly, we summarize existing PPDP models, and analyze the backgrounds, advantages and disadvantages of them. Secondly, we introduce a series of anonymization operations, and highlight the generalization. Such operations are divided into global recoding and local recoding, which have their own advantages and disadvantages. Thirdly, information metrics of PPDP models are summarized. They are used not only to evaluate the performance of PPDP models, but also to search iteratively for optimization generalization or specialization space in implement. Fourthly, dynamic data publishing are introduced. It has four modes, which are multiple release publishing, sequential release publishing, continuous data publishing and collaborative data publishing. Lastly, on the situation of multiple sensitive attributes (SAs) data publishing, a series of privacy-preserving models are introduced.(2) A privacy-preserving model is proposed for dual protections on sensitive value and sensitivity level to deal with the similarity attacks on sensitivity. Sensitivity is a kind of ordered classification on SA. The sensitivity vulnerability may occur, under which an attacker can effectively infer the victim’s sensitivity of the sensitive value with limited knowledge. The proposed privacy model is an extended general model. It avoids sensitivity vulnerability successfully, while having the capabilities of the existing PPDP models. The model considers the difference among sensitive values (SVs) and applies SV classification, named as sensitivity level. The model can be applied not only to the single attribute, but also to multiple sensitive attributes situation. A diversity model is used to validate the capability of the general model. We carry out extensive experiments, to verify the improvement on effectiveness and efficiency. Furthermore, we present a Levels of Sensitive Values (LSV) measure, to calculate the sensitivity level.(3) To deal with attacks based on both ordered and disordered classifications of SA, a model of (w, y, k)-anonymity is proposed. From the classification of sensitive attribute aspect, the classification methods are divided into ordered and disordered ones. The attacks based on classifications are settled down from this aspect. On the premise of the linkage between individuals and sensitivity values under protection, the different privacy protections on different sensitivity levels are considered. Furthermore, the model avoids the sensitive values aggregate on an ordered class or a disordered one within an equivalence class. Therefore, the proposed model can resist the two kinds of similarity attacks. We prove that the optimal problem of the model is NP-hard. The model conducted by a top-down local recoding algorithm. Experiments verify the privacy-preserving capability of the model.(4) A privacy-preserving model is set up to resist attacks based on both ordered and disordered classifications of SA under the situation of streaming data. According to the dynamic characteristics of streaming data, the streaming data are cached to a limited window. According to data from different times, classes of sensitive attributes are adjusted dynamically. A privacy protection model for streaming data is proposed, to prevent streaming data from similarity attacks based on an ordered class or a disordered one. The model is conducted by top-down local recoding window algorithm. It improves the execution efficiency.In a word, we research PPDP, and propose some PPDP models for different privacy requirements and application environments. The protection on linkage between individuals and SAs are realized, and the capacity of private information protection is improved.
Keywords/Search Tags:Data security, Privacy preserving, Big data, Data publishing, Anonymization, Sensitive attribute, Sensitivity level, Streaming data
PDF Full Text Request
Related items