Font Size: a A A

Research On Decision Tree Classification Based On Differential Privacy

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2518306047482114Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of information and network communication technology has made data sharing more and more frequent,and also greatly increased the risk of personal privacy data leakage.Therefore,people attach more and more importance to the protection of their private information.In the field of data mining,the traditional decision tree classification method does not protect the data.It only focuses on extracting valuable information in the data set and improving the accuracy of classification.Differential privacy has a strong intensity of data protection.The application of privacy to the decision tree classification method has great significance.This paper mainly researches decision tree classification methods based on differential privacy,including noise allocation,processing methods for continuous data and discrete data,and applying smoothing sensitivity to random forest decision tree algorithms.First,the classic decision tree classification method based on differential privacy was researched.It was found that Su LQ-ID3 algorithm and Diff PID3 algorithm cannot process continuous attributes.Diff P-C4.5 algorithm consumes too much privacy budget for continuous attribute processing.The above-mentioned problem proposes the DPE-C4.5 algorithm,which uses an exponential mechanism,uses continuous ratios to determine the split point,and participates in the selection of split attributes with discrete attributes.Finally,an exponential mechanism is used to add noise to the data to ensure that the algorithm meets the requirements of differential privacy protection.Experimental results show that for the same privacy budget?,the accuracy of classification is improved compared with the existing decision trees.The global sensitivity of the random forest decision tree algorithm based on differential privacy.The global sensitivity of the counting function is 1.Using smooth sensitivity does not reduce the sensitivity of the counting function.However,querying data will consume unnecessary data when building a decision tree.Therefore,this paper proposes the PRFSen algorithm for this problem.This algorithm applies smoothing sensitivity to a random forest decision tree based on differential privacy,and uses a ratio to determine the segmentation points of continuous attributes when constructing each decision tree.Participate in the selection of node attributes.The experimental results show that the proposed algorithm can improve the classification accuracy under the same privacy budget ?.
Keywords/Search Tags:Differential privacy protection, Exponential mechanism, Smooth sensitivity, Decision tree
PDF Full Text Request
Related items