Font Size: a A A

Research Of Sensitive Itemsets Hiding Algorithms Based On Multi-objective Framework

Posted on:2019-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:2428330590974189Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining technology aims to extract important knowledge from massive data.However,as data mining technology matures,private or sensitive knowledge can be leaked out during the mining process.Especially in business cooperation,data needs to be shared among different institutional departments.Shared data may contain sensitive knowledge related to the interests of data owners.If shared data recipients use existing data mining techniques to mine sensitive knowledge,it would bring significant security threats to data owners.Therefore,data owners need to protect sensitive knowledge in shared data as much as possible before sharing data,while ensure the mining process as far as possible.The dissertation mainly studies the privacy protection problem in frequent pattern mining,and aims to prevent the sensitive frequent itemsets from leaking.When we deleted some transactions in the original database,the frequency of sensitive itemsets would be reduced.Thus,the sensitive frequent itemsets would be hidden in the perspective of data mining.Unfortunately,the sensitive itemsets hiding technology is accompanied by side effects.It is proved as an NP-hard problem.The main disadvantage of some methods in the past is that only a single target can be considered,but only the local optimal solution can be obtained.In addition,the information hiding method of the existing evolutionary computing algorithm relies heavily on the predefined weights of each side effect in the fitness function.The predefined weight will seriously affect the experimental results.In view of the above problems,this dissertation mainly adopts the multi-objective optimization algorithm to solve the problem.The main contents and contributions of this dissertation are shown as follows:So as to reduce the side effects caused by sensitive itemesets hiding algorithm from the perspective of global optimization,the dissertation firstly proposes that pNSGA2 DT algorithm which transforms the sensitive itemsets hiding problem into multi-objective optimization problem,and adopted non-dominated sorting genetic algorithm-II(NSGAII)framework to find optimal solution set.In contrast to existing algorithms,the optimal solution set produced by the algorithm corresponds to various trade-offs within side effects,which provides the user with the opportunity to freely choose their own optimal solution according to preference or experience.In addition,the algorithm uses the Pre-large concept and the improved fast non-dominated solution sorting algorithm to improve the efficiency of the algorithm.The experimental results show that pNSGA2 DT can obtain a set of Pareto optimal solution sets,which has the advantage of providing a large number of optimal solution choices compared with the existing sensitive information hiding algorithms,and the proposed optimization strategies have proved greatly improve the algorithm.The pNSGA2 DT algorithm requires a lot of population crossover,mutation and selection operations in the running process,which requires a lot of time and the distribution of the solution set has room for improvement.This dissertation first proposes the multiple objective particle swarm optimization(MOPSO)based framework(MOPSO2DT),which not only saves a lot of time,but also improves the distribution of the solution set.Moreover,two strategies for updating the pbest and gbest in the evolution progress are then designed.The experimental results show that the distribution and efficiency of the MOPSO2 DT algorithm are further improved compared with the proposed pNSGA2 DT algorithm.In order to further improve the distribution of the solution set,the Pareto ant colony optimal(PACO)based framework algorithm(PACO2DT)is adopted in this dissertation.This algorithm not only inherits the advantages of ant colony optimal(ACO)algorithm running fast,but also adopts an effective external memory update strategy,and even proposes a novel non-dominated solution discriminant algorithm to further improve the algorithm operation efficiency.The experimental results show that the PACO2 DT algorithm has a significant improvement in the distribution of the solution set and efficiency compared with the proposed algorithms based on multi-objective framework.This dissertation aims to solve the limitations of existing sensitive itemsets hiding algorithms,which adopt multi-objective optimization algorithms to solve problems creatively,and provide a set of global optimal solution sets to give users choice.In addition,this dissertation designs a large number of experiments to verify the advantages of the proposed algorithm,and proves that the proposed algorithms have improved the distribution of the solution set and the efficiency of it.
Keywords/Search Tags:sensitive itemsets, privacy-preserving data mining, multi-objective optimization, evolutionary computation
PDF Full Text Request
Related items