Research On Privacy Preserving Approaches For Frequent Itemset Mining And High-Utility Itemset Mining

Posted on:2020-08-18

Degree:Master

Type:Thesis

Country:China

Candidate:S X Li

Full Text:PDF

GTID:2428330599957027

Subject:Signal and Information Processing

Abstract/Summary:

With the rapid development of electronic information technology,the cost of acquiring and storing data has been significantly reduced,and the ability to process data has been greatly improved.As a technical means to effectively mine useful information in data,data mining techniques have been widely studied and applied in recent years.Among them,frequent itemset mining and its derivative technology have played an increasingly important role in the knowledge mining task of large-scale datasets,which aim to discover high-value patterns.However,the sensitive high-value itemsets are at risk of being compromised when the dataset is exposed or shared.Therefore,how to preserve the privacy of sensitive itemsets when the data consumer change has become a crucial research topic to be solved.In recent years,researchers have proposed various privacy preserving methods for high-value itemset mining based on different theories and techniques.However,these methods have led to different degrees of loss to the utility of the dataset itself while completing the task of hiding frequent itemsets or high-utility itemsets,which include the loss of important information and the generation of incorrect information.Therefore,it is another focus of the subject to preserve the utility of data as much as possible in the process of implementing privacy preserving strategy.At present,for this problem that has proven to be NP-hard,the proposed solutions failed to reduce the loss of data utility to a desirable level.Thus,contrapose the privacy leakage issue in the frequent itemset mining and highutility itemset mining,two novel models are contributed in this thesis,i.e.,the privacy preserving model of frequent itemset based on dataset reconstruction and the privacy preserving model of high-utility itemset based on integer linear programming.The former adopts the idea of reconstructing the dataset.The sensitive information in the mining results of the original dataset is removed,and then,the inverse frequent itemset mining technique is utilized to conduct the reconstruction of dataset based on the remain nonsensitive frequent itemsets.Finally,the dataset is extended to finish the task of hiding sensitive frequent itemsets.As the confidential information is preprocessed,the reconstructed dataset no longer contains any sensitive itemsets,and the characteristics of the reconstruction algorithm also ensure the minimization of the loss of data utility.The latter formulates the whole process of hiding sensitive high-utility itemsets and minimizing the loss of data utility as a constraint satisfaction problem,which is mapped to an equivalent integer linear programming problem and solved,and then perturbs the original dataset according to the solution.By using two auxiliary table structures,the model is able to quickly establish the constraint satisfaction problem with only one scan on dataset,which greatly optimizes the operation time of the proposed model.Moreover,the validity of the itemset in the constraint satisfaction problem is also guaranteed,accordingly,a large amount of information loss caused by the invalidity of itemsets can be avoided,and the data utility is preserved as much as possible.In addition,extensive comparative experiments are conducted on real datasets to verify the effectiveness and superiorities of the proposed models.

Keywords/Search Tags:

frequent itemset, high-utility itemset, dataset reconstruction, integer linear programming, privacy preserving

Related items

1	Research On Novel Methods In Utility Pattern Mining
2	Research On Frequent And Closed High Utility Itemset Mining Algorithm Based On Spark
3	Research On Privacy Preserving Highutility Mining
4	Research On Frequent-High Utility Itemset Mining Based On Multi-Objective Evolutionary Computation
5	Multi-Relational Frequent Pattern Mining Algorithm And Its Application Research
6	Research On Frequent And High-utility Itemset Mining Algorithms Over Data Stream
7	Research Of High Frequent-utility Itemset Mining
8	Research On Algorithms And Their Performance For Frequent Itemset And High Utility Itemset Mining
9	The Research And Application Of Association Rules Mining Algorithms Based On Directed Itemset Graph
10	Research On Mining Frequent Itemsets Algorithm Based On Bittable