Font Size: a A A

Research On Ensemble Classification Methods With Differential Privacy

Posted on:2022-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:X W SunFull Text:PDF
GTID:2518306338461514Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the era of big data and advanced information technology,data mining technology has been greatly promoted.But,personal information may be involved in the training of the data mining model.Improper use of these data is more likely to cause serious privacy leakage problems,which will have a negative impact on individuals and even society.Also,it also greatly hinders the development of data sharing and data mining technology.Now,data mining with privacy protection has become a hot research direction,whose core task is how to balance the contradiction between privacy protection and model accuracy.Differential privacy technology provides a rigorous mathematically defined and operable privacy protection mechanism,which provides a new direction for privacy-preserving data mining.In the ensemble classification algorithm involved in this thesis,the implementation mechanism and design focus of applying differential privacy to the corresponding data mining algorithm are studied and analyzed,as well as the privacy budget allocation strategy in the algorithm implementation.The balance between the privacy of the proposed algorithm and the usability of the model is achieved.The main work of this thesis can be summarized as:(1)In order to improve the usability of differentially private decision tree,we design a depth-based privacy budget allocation strategy,which can reserve the privacy budget to the leaf nodes of decision tree as much as possible.The problem of the noise accumulation with the increase of the level is avoided in the traditional hierarchical equalization strategy.In addition,the implementation mechanism of different nodes is analyzed when constructing differentially privacte decision tree.(2)Based on previous research,we propose a random forest construction algorithm satisfying differential privacy protection.The availability of random forest model is improved by adaptive integration scheme based on selection.(3)To solve the privacy protection problem in distributed data mining,we combine differential privacy and AdaBoost algorithm to propose a distributed decision boosting tree algorithm based on differential privacy protection.Through the idea of distributed integration,the data distribution of participants is strengthened,and an adaptive model integration mechanism is designed,so that the algorithm improves the accuracy of model classification under the premise of satisfying differential privacy protection.Finally,theoretical analysis and experiments verify that the proposed algorithms can ensure the accuracy and availability of the algorithm results under the premise of privacy protection.
Keywords/Search Tags:ensemble learning, differential privacy, decision tree, decision forest, AdaBoost
PDF Full Text Request
Related items