Font Size: a A A

Research On Association Rules Algorithm Based On Differential Privacy

Posted on:2023-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:M FanFull Text:PDF
GTID:2568306836969689Subject:Cyberspace security
Abstract/Summary:
Association rule mining is a research hotspot in data mining,but when mining association rules,users’ private information may be leaked.Therefore,it is valuable to provide reliable privacy guarantees without significantly reducing accuracy during association rule mining.Differential privacy,as a data distortion technique,avoids the leakage of sensitive information by adding random noise,and it does not depend on the background knowledge and computing power of the attacker Differential privacy protection data will reduce the accuracy of mining results because differential privacy prevents the disclosure of sensitive information by adding random noise.Therefore,the focus of current research is how to improve the availability of mining results with given privacy budget.In order to improve the accuracy of mining results and the efficiency of the algorithm,two new association rules mining algorithms satisfying the differential privacy are proposed in this thesis.Based on the analysis of the existing differential privacy association rules,it is found that reducing the global sensitivity can effectively improve the availability of mining results,and the global sensitivity of counting queries is related to the longest transaction length of data sets.To this end,this thesis proposes an association rules mining algorithm TS-ARM that satisfies differential privacy,designs a new transaction splitting method to reduce the dimensionality of the dataset,uses a double conditional mechanism to split long transactions,sets two splitting conditions,the splitting length threshold and the splitting support threshold,the splitting support threshold is set to ensure that items with higher support can be divided into the same transaction,thereby reducing the impact of transaction splitting on mining results.Compared to transaction truncation,this method of transaction splitting reduces the sensitivity of the query and the size of the noise required,while reducing the loss of information.Traditional association rules mining algorithms need to scan the database for many times in the mining process.TS-ARM algorithm introduces transaction matrix to avoid repeated database scanning,but it still has the problem of low mining efficiency for large-scale data sets.Aiming at this problem,sampling technology is introduced,and an association rules algorithm SampleTSARM algorithm based on Vapnik-Chervonenkis Dimension to calculate the number of samples is proposed,and the algorithm for calculating the upper bound of Vapnik-Chervonenkis Dimension is improved to further improve the execution efficiency of the algorithm.The sample size required by the approximate association rules obtained by the method can ensure that the mining results are close to the real mining results with a certain probability.Finally,experiments are carried out on four data sets,and the experimental results show that the usability of TS-ARM algorithm and SampleTS-ARM algorithm is higher than the existing differential privacy association rule mining algorithm,and the SampleTS-ARM algorithm has a great improvement in execution efficiency compared with the existing algorithm and TS-ARM algorithm.
Keywords/Search Tags:Differential privacy, frequent itemsets mining, association rules mining, transaction splitting, sampling, Vapnik-Chervonenkis Dimension
Related items