| At present,Gradient Boosting Decision Trees has become a popular machine learning algorithm,and has shined in many data mining competitions and practical industrial applications due to its remarkable results in classification,ranking,prediction,etc.With the wide application of traditional machine learning,it is increasingly restricted by training data sets.In order to further achieve high-quality model training tasks,decentralized data with ownership needs to break through the limitations of privacy,communication,benefit distribution,security and other constraints to achieve an alternative to centralized unified a new computing model for processing.Federated learning aims to reduce privacy risks and costs,enabling entities to persist data locally and collaboratively train models under a unified orchestration service.However,federated learning systems based on gradient boosted decision trees,which are widely used,cannot make a good trade-off between accuracy and communication.In addition,the current federated gradient boosting systems design also ignore an important aspect: fairness,that is,the reasonable distribution of benefits based on the contributions of different federated members to the federated model.The solution of this challenge directly affects whether the federated learning paradigm can be used in practical application scenarios.In summary,the current federated gradient boosting decision tree systems have not yet addressed the abovementioned key issues such as model availability and benefit distributability.This thesis studies a blockchain-based federated gradient boosting decision tree scheme,which can achieve constant communication overhead,good model performance,quantify the contributions of all parties,and provide a fair and credible coordination platform.This thesis specifically includes the following research contents:(1)This thesis studies FV-tree,a federated gradient boosting decision tree training method.By replacing the tree-based communication scheme with a pure gradient-based scheme,the intermediate gradient information is compressed to a small extent,and the privacy space decomposition is adopted.The technology and gradient refit strategy alleviate the problem of model performance degradation in skewed datasets due to traffic compression.Finally,this thesis provides a differential privacy protection scheme for FV-tree,and experiments verify that this method has good performance in large-scale data sets.(2)Then,based on the Shapley value research of cooperative game theory,this thesis analyzes the training process of gradient boosting decision tree,and introduces a novel quantitative index of contribution allocation,which names Shapley value Split.It utilizes the gain calculation in the training process of the gradient boosting decision tree,quantifies the contributions of different subjects in the alliance according to the relatively limited gradient update summary during the training process of the federated gradient boosting decision tree.Split Shapley has a fairness guarantee and provides a basis for the alliance to settle currency rewards.(3)Finally,this study organically combines the federated gradient boosting decision tree algorithm,contribution quantification mechanism and blockchain,and studies a distributed gradient histogram verification consensus mechanism.In this study,a closed-loop federated gradient boosting machine system FGBDT-Chain is implemented in a permissioned chain environment by using smart contracts to coordinate the calculation process and allocate training contribution indexes.Through comprehensive experiments on public datasets,the experimental results show that this research scheme achieves a good balance between model accuracy,communication overhead,fairness,and security under large-scale skewed datasets. |