Font Size: a A A

Study On The Frequent Itemset Mining Based On Differential Privacy

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZengFull Text:PDF
GTID:2428330629451227Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the arrival of big data era,a large amount of data and records are spawned from different industries.Mining useful information by processing data and records can be crucial for resource allocation,service improvement,or even guiding the development direction for certain industries.As one of the most important technologies of data mining,frequent itemset mining has been developed and applied profoundly,which has become the key of discovering the association rules among items.However,during the wide spread of data technologies,personal sensitive information is facing unprecedented threats.The traditional 6)anonymity method cannot provide enough protection for the sensitive information from the sophisticated attackers who have more powerful abilities to obtain background knowledge.In 2006,scientist Dwork from Microsoft proposed the differential privacy technology,which provides a solution to this problem.Differential privacy technology leverages the random noise to protect sensitive data.However,the random noise distorts the process of data mining,thereby compromising the validity of mining results.In order to effectively improve the quality of mining results without exposing privacy,this paper proposes a new frequent itemset mining algorithm based on differential privacy technology.The new algorithm is composed of two stages: dataset preprocessing and noisy frequent itemset mining.The main research contents are listed as follows:(1)In the preprocessing stage,this paper proposes a transaction shortening strategy to solve the problem of frequent information loss caused by transaction truncation.This strategy realizes the limit of maximum transaction length to decrease the algorithm sensitivity,and avoids the truncation error.Simulation results prove that the transaction shortening strategy effectively decreases the sensitivity of the algorithm,thereby avoids the excessive noise.(2)In the stage of mining noisy frequent itemset,this paper constructs the mining model based on Apriori.This paper tackles the problems caused by noise in itemset' support.On the one hand,the double thresholds method is used to avoid the transmission error;on the other hand,the modifying support strategy is used to reduce the overall noise in the algorithm results.Simulation results prove that these two strategies improve the quality of algorithm results.(3)According to the definition of differential privacy,this paper proves that the new algorithm satisfies the ? differential privacy protection.The simulation experiment in comparison with SmartTrunc algorithm proves that the new algorithm avoids the truncation error,the relative error of noisy support is reduced about 1/3,and the operation time is shorter.(4)Because the efficiency of the new algorithm is still low,this paper uses the optimized Eclat frequent itemset mining model to replace the original Apriori mining model,which avoids a large volume of invalid access to the dataset.Simulation results prove that the operation time of the improved algorithm is significantly reduced to 1/5,but there is a certain degree of truncation error.There are 25 figures,8 tables and 72 references in this paper.
Keywords/Search Tags:differential privacy, frequent itemset mining, transaction shortening, modifying support, concept lattice
PDF Full Text Request
Related items