Font Size: a A A

Research On Gradient Boosting Decision Tree Federated Learning Algorithm For Privacy Protection

Posted on:2024-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2568306944962649Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In modern society,due to reasons such as decentralized institutional autonomy,inter-industry competition,and user data privacy protection,data is often scattered among different companies or organizations and cannot be directly shared and used,which has resulted in the problem of"data islands",Federated learning provides a solution for training a machine learning model collaboratively across multiple parties without sharing the raw data,which to some extent alleviates the privacy concerns.However,attackers can infer the original data of participants through intermediate information such as the interaction gradients during the training process.Gradient Boosted Decision Tree(GBDT)is a widely used machine learning algorithm due to its advantages of being non-parametric,fast,and accurate.It is widely used in search,financial,and medical scenarios,and the privacy of training and deployment in real-world scenarios is also very important.Federated learning can be divided into vertical and horizontal federated learning depending on the distribution characteristics of the data.Existing GBDT training schemes in the vertical federated learning scenario are often conducted with labels stored in one party,and there are issues such as incomplete privacy analysis and significant accuracy loss.Existing GBDT training schemes in the horizontal federated learning scenario encounter the challenge of balancing communication efficiency and model accuracy while ensuring data privacy.In this thesis,we conduct a comprehensive study of the privacy preservation problems of GBDT in both federated learning scenarios,and achieve the following results:(1)In the vertical federated learning scenario,each participant has the same users but different dimensional features.This thesis proposes a PVDGBDT approach for the more common distributed label vertical federated learning scenario in real life,which satisfies label privacy protection.By disassembling and analyzing the information transmission process,and using partial homomorphic encryption and threshold partial homomorphic encryption techniques,the basic scheme in the presence of a semi trusted third party scenario and the enhancement scheme in the absence of a trusted third-party scenario are designed,respectively,preserving the privacy of tags during the training process ensures that the model effect is nearly lossless.Experimental results show that PVD-GBDT achieves almost lossless training effects while ensuring label privacy preservation and training overhead,and has high availability.(2)In a horizontal federation scenario,each participant has the same dimensional characteristics,but has different users.This paper proposes a PH-eGBDT scheme to address the privacy-preserving GBDT training problem in the horizontal federated learning scenario.To tackle the issue of scattered feature values across participating parties,we first use differential privacy technology to construct a globally weighted quantile that satisfies data privacy preservation.Then,two histogram sampling schemes based on client sampling and feature sampling are proposed to solve the high communication and time overhead issues caused by security aggregation,meeting the requirements for system efficiency.Experimental results show that the PH-eGBDT scheme has advantages over existing schemes in model accuracy and training overhead while ensuring data privacy.
Keywords/Search Tags:Federated learning, Gradient Boosted Decision Tree, privacy preservation, homomorphic encryption, differential privacy
PDF Full Text Request
Related items