Font Size: a A A

Research On Data Sanitization-based Strategy Of Sensitive Information Protection

Posted on:2014-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2248330398450106Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology, in order to extract useful knowledge from massive data, the data user takes a variety of data mining techniques to deal with the database which is published by the data owners. On the one hand, the enterprise gains the huge profit with the rapid development of data mining technology. On the other hand, during the process of the data mining, it also increases the risk of exposure of potentially sensitive knowledge of database. Obviously, it does not work in the economic globalization, if the data owner rudely refuses to release the data. And that is not conducive to the sustainable development of business cooperation between enterprises. In order to achieve a win-win situation, it becomes a necessary prerequisite to share database. Therefore, for the all parties’ profit, it has important theoretical and practical significance to ensure that sensitive information can be hidden while the data can be mined.In this paper, we design a sanitization-based privacy preserving strategy for different enterprises’requirements. For the data owners who care the accuracy of the database and the loss of information after releasing the database, the paper design a model-based hidden strategy. For the data owners who have different risk preferences, this paper design a heuristic algorithm-based hidden strategy. The privacy protection method we proposed is different from those who merely hide sensitive properties. The object of protection in this paper is the sensitive information contained in the database, and thus it can achieve the purpose for data owners to protect the potential core competitive business knowledge.Firstly, this paper establishes a constraint satisfaction model for the sensitive frequent itemsets hidden problem. In particular, it is the multi-objective optimization model which meets the multiple needs for accuracy and information loss. In the process of analysis model, we derive the optimal strategy after the qualitative and quantitative analysis and we also propose a method to transfer the nonlinear constraints into linear constraints. In addition, we modify the original model with the help of the border theory. The pruned model-based hidden strategy greatly improves the efficiency of the original one.Secondly, considering the data owners’different risk preferences, this paper introduces the concept of risk exposure and it puts forward a heuristic algorithm-based frequent itemsets hiding strategy. The optimal hidden strategy can be obtained after the quantitative analysis about the sanitization for non-sensitive information. In terms of minimizing information loss, the hidden strategy proposed in this paper is better than existing heuristic algorithms from the result of numerical experiments.In the process of the study, we make full use of the existing research results, in-depth study privacy preserving data mining and purification strategies. The results derived in this article can give theoretical support, decision support and practical guidance to modify the database for the enterprise in the data sharing stage.
Keywords/Search Tags:Data Mining, Privacy Preserving, Data Sanitization, Border Theory
PDF Full Text Request
Related items