Font Size: a A A

Research On Anonymization Privacy Protection Techniques To Data Publishing

Posted on:2017-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:J HuFull Text:PDF
GTID:2348330533450337Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As one of the most important ways of resource sharing, data publishing provides a convenient and fast way to publish and share the information. In the process of data publishing, if the published sensitive attributes information does not get any protection, it could be vulnerable to privacy information leakage, which leads to immeasurable loss.In the privacy protection of single sensitive attribute, l-diversity model based on domain generalization technology produces unnecessary loss of information, which leads to the poor usability of anonymity data. In addition, the sensitive attribute values would also be vulnerable to similarity attack and skewness attack. In the privacy protection of multiple sensitive attributes, the multi-dimensional bucket grouping approach is usually used to protect privacy, but the l-diversity grouping principle of composite sensitive attribute is too harsh on the distribution of sensitive attribute values, which leads to the high suppression ratio. Besides that, the approach is only applicable to the situation that the dimension of sensitive attributes is relatively small. If the dimension of sensitive attributes is larger, the additional loss of information and suppression ratio that are produced by the approach will be larger. In view of the above problems, the details of research works and innovations are as follows:1. Aiming at the above problem of the privacy protection of single sensitive attribute, this thesis proposes an l-diversity anonymization privacy protection algorithm based on clustering. The algorithm uses the clustering techniques to generate equivalence class, and performs local generalization to reduce the information loss. Duo to the algorithm cannot prevent similarity attack and skewness attack effectively, this thesis improved the algorithm and proposes an(l, c)-diversity anonymization based on sensitivity grouping constraints. According to the sensitivity, sensitive attribute values are divided into a plurality of sensitive group. By setting constraints for sensitive groups and maximum frequency threshold for sensitive attribute values, the improved algorithm has a better performance for privacy protection.2. Aiming at the above problem of the privacy protection of multiple sensitive attributes, this thesis proposes an(p, l)-anonymization privacy protection model based on correlation division of multiple sensitive attributes. Firstly, according to the size of the correlation calculated by information gain method, multiple sensitive attributes are classified to reduce the dimension. Secondly, according to(p, l)-diversity grouping principle, sensitive attributes are grouped to ensure that published data could prevent skewness attack and reduce the risk of background knowledge attack. Finally, the model is implemented by the clustering techniques. The results show that the additional losses of information and suppression ratio are small and data has higher usability.
Keywords/Search Tags:data publishing, sensitive attribute, privacy protection, l-diversity
PDF Full Text Request
Related items