Font Size: a A A

Research On Privacy Preserving Approaches For Frequent Itemset Mining And High-Utility Itemset Mining

Posted on:2020-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:S X LiFull Text:PDF
GTID:2428330599957027Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of electronic information technology,the cost of acquiring and storing data has been significantly reduced,and the ability to process data has been greatly improved.As a technical means to effectively mine useful information in data,data mining techniques have been widely studied and applied in recent years.Among them,frequent itemset mining and its derivative technology have played an increasingly important role in the knowledge mining task of large-scale datasets,which aim to discover high-value patterns.However,the sensitive high-value itemsets are at risk of being compromised when the dataset is exposed or shared.Therefore,how to preserve the privacy of sensitive itemsets when the data consumer change has become a crucial research topic to be solved.In recent years,researchers have proposed various privacy preserving methods for high-value itemset mining based on different theories and techniques.However,these methods have led to different degrees of loss to the utility of the dataset itself while completing the task of hiding frequent itemsets or high-utility itemsets,which include the loss of important information and the generation of incorrect information.Therefore,it is another focus of the subject to preserve the utility of data as much as possible in the process of implementing privacy preserving strategy.At present,for this problem that has proven to be NP-hard,the proposed solutions failed to reduce the loss of data utility to a desirable level.Thus,contrapose the privacy leakage issue in the frequent itemset mining and highutility itemset mining,two novel models are contributed in this thesis,i.e.,the privacy preserving model of frequent itemset based on dataset reconstruction and the privacy preserving model of high-utility itemset based on integer linear programming.The former adopts the idea of reconstructing the dataset.The sensitive information in the mining results of the original dataset is removed,and then,the inverse frequent itemset mining technique is utilized to conduct the reconstruction of dataset based on the remain nonsensitive frequent itemsets.Finally,the dataset is extended to finish the task of hiding sensitive frequent itemsets.As the confidential information is preprocessed,the reconstructed dataset no longer contains any sensitive itemsets,and the characteristics of the reconstruction algorithm also ensure the minimization of the loss of data utility.The latter formulates the whole process of hiding sensitive high-utility itemsets and minimizing the loss of data utility as a constraint satisfaction problem,which is mapped to an equivalent integer linear programming problem and solved,and then perturbs the original dataset according to the solution.By using two auxiliary table structures,the model is able to quickly establish the constraint satisfaction problem with only one scan on dataset,which greatly optimizes the operation time of the proposed model.Moreover,the validity of the itemset in the constraint satisfaction problem is also guaranteed,accordingly,a large amount of information loss caused by the invalidity of itemsets can be avoided,and the data utility is preserved as much as possible.In addition,extensive comparative experiments are conducted on real datasets to verify the effectiveness and superiorities of the proposed models.
Keywords/Search Tags:frequent itemset, high-utility itemset, dataset reconstruction, integer linear programming, privacy preserving
PDF Full Text Request
Related items