Federated learning can solve the data silo problem of traditional machine learning,and the original data of each participant can be used to train machine learning models without leaving the local area.However,in reality,the dataset of each participant may have large differences in the instance space and feature space,which makes the training of the federated model difficult and causes the prediction accuracy of the federated model to decrease.In addition,the data in the training process of the federated model,such as gradients,original local models,etc.,may indirectly leak the privacy of participants,and privacy protection strategies need to be introduced to avoid the indirect leakage of privacy.To address the above issues,this paper has carried on the related research:(1)An Instance Similarity based Federated Transfer Learning method(IS-FTL)is proposed for the problem that the dataset of each participant may have large differences in the instance space and feature space,which makes the prediction accuracy of the federated model to decrease.The hash values of the participant instances are calculated using the BoundaryExpanding Locality-sensitive hashing algorithm to construct a hash table and a similarity matrix,which in turn mines the similarity between the participant instances.Participants increase the weight of local data based on instance similarity to achieve instance-based federated transfer learning.XGBoost gradient boosting tree model is used to implement the IS-FTL method.Experiments show that IS-FTL can improve the federated model prediction accuracy.(2)To address the shortcomings of the IS-FTL method in privacy protection,a Differential Privacy based Federated Transfer Learning(DPFTL)method is proposed in conjunction with the key of tree model privacy protection.To address the privacy leakage problem,differential privacy is used to protect the gradient information exchanged by each participant,the splitting process of split nodes,and the output values of leaf nodes.To address the problem that the sensitivity bounds of the utility function of split nodes and the output values of leaf nodes are difficult to determine when differential privacy is applied to the tree model,the method introduces a sensitivity calculation method based on the maximum gradient.A privacy budget allocation strategy EA-APA based on information entropy adjustment is proposed to improve and optimize the traditional privacy budget allocation strategy.Theoretical analysis shows that DP-FTL satisfies ε-differential privacy protection.Experiments show that the combination of DP-FTL and EA-APA can reduce the model prediction error and enhance the stability. |