With the development of Internet information technology,the data has been as a kind of important resource demand by the government and institutions,through the analysis of these huge amounts of data researchers can get more information about the current world,so the data mining tcchnology appcared.As a kind of data mining,frequent pattern mining has been widely used in the application of recommendation system and personalized website.However,due to the privacy disclosure in recent years,data mining technology is facing serious challenges.How to apply the frequent pattern mining to obtain the valuable model and realize the protection of personal privacy information has become a research hotspot in this field.And differential privacy protection model,to protect the privacy of data information points out a new strategy,since it has the strict model and background knowledge can effectively prevent the attack and the attention of academia.How to improve the efficiency of mining algorithm and get the high availability result set under the condition of differential privacy protection has become the focus of research in this field.This paper works on the efficiency of frequent itemsets mining,algorithms under differential privacy.Through in-depth analysis of factors that restrict the efficiency of differential privacy protection algorithms,an improved algorithm is proposed and researched.The main results arc as follows:1)For DP-topkP(Differentially Private top-k Pattern Mining)algorithm in a database containing a large number of long transactions,When the minimum threshold gradually becomes smaller or the transaction data sets continue to increase,it takes a lot of time,so we put forward a kind of improved algorithm efficient algorithm DP-OPtopkP(Differentially Private Optimal top-k the Pattern Mining),The new algorithm uses a length selection mechanism to predispose the transaction database.Secondly,the candidate frequent item set obtained by the FP-Growth algorithm is used to reduce the collection scale by using the closed frequent item set.The experimental results show that the improved algorithm DP-OPtopkP is improved in efficiency and has good usability.2)Under the condition of large-scale data set,the FP-tree which is brought by the improved algorithm DP-OPtopkP may not stop in the memory,which leads to the rapid decline of the overall efficiency of the algorithm.The parallel improvement scheme of the DP-OPtopkP is proposed.The main idea of this scheme is to synchronize the data in batch.First,the truncated data sets are divided according to the established requirements;then the FP-Growth algorithm is run separately on each partition;then the frequency selection set is divided and the closed frequent itemset algorithm is run separately on each partition;finally,the result set is calculated.Experimental results on large scale datasets show that the parallelized DP-OPtopkP algorithm has obvious advantages. |