Font Size: a A A

Research On Privacy Preserving Highutility Mining

Posted on:2017-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:G LinFull Text:PDF
GTID:2308330503987049Subject:Computer technology
Abstract/Summary:PDF Full Text Request
It is the major concern for data owner to find the useful information from the amounts of data and transform them to the understandable knowledge. Those discovered knowledge can be used to provide the basis for making efficient strategies or decisions. The fundamental way of knowledge discovery in databases(KDD) is frequent itemset mining(FIM) or association-rule mining(ARM), which can be used to find the implicit and potential relationships among the purchase items in the binary databases. High-utility itemset mining(HUIM) is an extension of FIM which considers both the quantity and profit of items to measure the utility of itemsets in the databases. In the progress of data co llection or data distribution, the confidential or sensitive information(i.e., credit card no., annual salary, or personal phone no.) can be, however, discovered by the variants of data mining techniques, which may bring the security threats and should be hidden before the data is published in the public or shared with the collaborators. Based on HUIM, the discovered high-utility itemsets can be referred to find the interest information about competition especially in the way of cooperation. Thus, privacy-preserving utility mining(PPUM) has also become a critical issue to hide the sensitive high-utility itemsets before the data is published in public or shared with collaborators.The motivation of this dissertation can be divided into two parts. First, this dissertation aims at finding the sensitive high-utility itemsets(HUIs) from the amounts of data. Second, find new way to efficiently hide the HUIs with minimal side effects for data sanitization. Instead of the traditional way to hide the sensitive information by the given sensitive HUIs, we first design a method to analyze the requirements of users to detect the sensitive HUIs that required to be hidden. Users may not know what kinds of items need to be hidden, a automatically detecting sensitive itemsets was proposed. This method can analyze the given itemsets to find a degree of them and choose the specific K number of itemsets with higher sensitive degree as the sensitive HUIs to be hidden. This method can analyze the given itemsets to find a degree of them and choose the specific K number of itemsets with higher sensitive degree as the sensitive HUIs to be hidden. Besides, three different methods are respectively developed based on the maximal sensitive utility of transaction(MSU) to hide the sensitive HUIs whether the sensitive HUIs are given by the users or automatically detected by the first designed approach. Since the previous criteria of side effects used in privacy-preserving data mining(PPDM) are insufficient to evaluate the performance of the designed algorithms used in PPUM, three new criteria are respectively designed to show the performance of the designed algorithms in PPUM. In the experimental results, it can be observed that the designed algorithms have better performance than the state-of-the-art algorithms used in PPUM. The side effects of the designed algorithms generally decreased by 10%, compared to the previous works, especially in a very dense dataset. Moreover, the designed algorithms can achieve smaller number of operation to modify the transactions or sanitize the entire database, thus keeping the database integrity the same as the original database as much as possible.
Keywords/Search Tags:high utility privacy preserving, itemset hiding, minimum side effects, auto detection, maximum sensitive utility
PDF Full Text Request
Related items