Font Size: a A A

Research On Decision Tree Algorithm Based On Differential Privacy Technology Protection

Posted on:2024-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:X QinFull Text:PDF
GTID:2558307124986319Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,data has become an essential part of today’s society.Therefore,it is very important to apply data mining technology to extract key information from huge data.A decision tree is a machine learning model used in data mining.Compared with other methods,the construction of decision tree is simple and easy to understand and explain,which makes decision tree widely used.However,data thieves can obtain sensitive information from the decision tree by comparing the count results.Since differential privacy provides a rigorous mathematical theory to prove its level of privacy protection without causing excessive computational costs,applying differential privacy technology to protect decision trees is an effective privacy-preserving data mining method.However,in the study of applying differential privacy technology to protect decision trees,it is found that there are still some challenges in improving model performance and balancing data security and availability.Therefore,this paper conducts research on the construction method of decision tree and the implementation method of differential privacy,so as to further improve the performance of decision tree model and the security and availability of data.The main research work in this paper is as follows:(1)In order to solve the problem that a single decision tree algorithm may not be able to select the best model in the training samples,which makes the model less accurate,we adopt the idea of ensemble learning,by constructing multiple decision trees to form an decision tree ensemble model with higher accuracy.And by improving a single decision tree,the training time of a single decision tree is reduced,thereby reducing the running time of the overall decision tree model and improving model efficiency.The experimental results show that a single decision tree improvement algorithm can reduce the running time of the model,and at the same time,the application of ensemble ideas improves the accuracy of the model.(2)Aiming at the problem that the accuracy of the decision tree decreases due to the application of differential privacy technology to protect the decision tree,a differential privacy decision tree algorithm based on two-step calculation is proposed.This method allocates privacy budgets to different nodes in the decision tree to ensure that the privacy budget consumed by each query reaches a given value,and improves the accuracy rate drop problem caused by the low allocation of privacy budget for a certain query.And a two-step calculation strategy is introduced to make the privacy budget allocated to the leaf nodes in the decision tree larger and alleviate the sensitivity of the leaf nodes to noise.The experimental results verify that the improved algorithm can improve the accuracy of the model on the basis of privacy protection.(3)In order to solve the problem that the privacy protection level of the decision tree ensemble model decreases when the differential privacy technology is applied to protect the decision tree ensemble model,a decision tree ensemble algorithm based on the parallel composition property of differential privacy is designed,and the improved differential privacy protection technology realizes the parallel composition property in the ensemble model,and randomly samples the training data set of a single decision tree in the ensemble model,so that the training data sets of each decision tree are disjoint.Experiments show that this method improves the privacy protection of the model while ensuring the availability of data.
Keywords/Search Tags:Decision Tree, Differential Privacy, Two-Step Calculation, Ensemble Learning, Parallel Composition
PDF Full Text Request
Related items