Font Size: a A A

Privacy Preserving In Association Rule Mining

Posted on:2016-09-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:P ChengFull Text:PDF
GTID:1108330503469719Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Various data mining techniques have been used to discover useful and unknown knowledge from large collections of data. However, there is a risk of disclosing sensitive information when data is shared between different organizations. The shared data may contain sensitive knowledge, such as personal secrets, business strategies, business benefit conflict,.or other things related to policy enforcement. Exposure of sensitive knowledge may do harm to the data owner. Thus,the balance between legitimate mining needs and the protection of confidential knowledge when data is shared must be carefully managed.This study focuses on privacy preserving in association rule mining. This field can be further divided into protecting sensitive data and protecting sensitive knowledge contained in data. This work concentrates on the latter. A feasible way to ensure the confidentiality is to sanitize the database and conceal sensitive knowledge in forms of association rules. However, the sanitization process often produces side effects, including missing non-sensitive rules, ghost rules and data distortion. Thus, minimizing these side effects is a significant and challenging task in this field.In this study, the author has investigated different ways to address the problem of association rule hiding, and propose several new association rule hiding methods. Each of them is evaluated by extensive experiments. The main work in this dissertation is summarized as follows.The problem of association rule hiding is studied from a view of multi-objective optimization. Most existing solutions try to reduce side effects by utilizing some local features of a problem. Usually, they only can find a local optimal solution. In addition, there exists a tradeoff when minimizing different side effects simultaneously. However, this fact has been ignored in past literatures. In this dissertation, a new hiding approach based on evolutionary multi-objective optimization(EMO) is proposed, named as EMO-RH, which can hide sensitive rules with minimizing different side effects simultaneously. Experimental results show that EMO-RH can effectively reduce side effects. EMO-RH is robust to datasets and parameter settings. It can produce multiple solutions in a single run. This presents an opportunity for the user to freely choose the preferred one.In order to utilize the advantage of heuristic algorithms, such as efficiency and scalability, a new heuristic hiding method is proposed, named as Relevance-sorting. It hides sensitive rules by removing some items in a database to reduce the support or confidence of sensitive rules below specified thresholds. To reduce side effects, it utilizes the count of non-sensitive itemsets contained by each transaction to select candidates for modification. Comparative experiment results show that Relevance-sorting may reduce missing non-sensitive rules effectively.The dissertation provides an initial insight into the theory on association rule hiding. By analyzing the cause of side effects, the author proposes the theory of “border rule”, and presents a hiding solution based on border rules, named as BRDA. Two kinds of border rules are defined: positive border rules and negative border rules. Positive border rules are such rules which can easily become missing by mistake in the sanitization process. All missing non-sensitive rules come from positive border rules. Negative border rules are pre-strong rules that may easily become ghost rules. All ghost rules come from negative border rules. BRDA utilizes the relation between supporting transactions and border rules to find candidate transactions for modification. Comparative experiment results demonstrate that BRDA can hide sensitive rules with much fewer side effects and database modifications.The blocking-based rule hiding way is investigated in this dissertation. A new blocking-based hiding method called as BRBA has been proposed. Different from the distortion-based way adopted by the above methods, the blocking-based way will not introduce false values into data. The user can differentiate which items have not been modified. This is necessary in some applications. In the blocking-based way, the support or confidence of a rule is not a single value again but an interval, called as the support interval and the confidence interval. BRBA reduces the lower bound of the confidence interval below a specified threshold to hide a sensitive rule. The risk of blocking-based way is that an attacker could find the sensitive rules by analyzing data features. To reduce this risk, the number of ghost rules should be maintained to a certain level, since they behold the same characteristics with hidden sensitive rules and attackers are difficult to identify which ones are sensitive among them.
Keywords/Search Tags:Association rule hiding, privacy preserving data mining, sensitive knowledge, multi-objective optimization
PDF Full Text Request
Related items