Research Of Data Integration Based On Random Forest

Posted on:2024-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:P C Xiang

Full Text:PDF

GTID:2568307088951019

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Facing either enormous datasets that exceed the training capacity of a single machine or data stored in multiple clients that cannot be aggregated due to data privacy protection,researchers have provided methods to address multi-client joint training under distributed conditions,and federation learning is one of the major tools.The current mainstream federal frameworks based on neural networks and decision trees rely on the joint learning of multiple rounds of updates of parameters such as gradients among sites to achieve unimpressive performance.The complex learning process,high learning cost,and opacity of the training model limit the application of traditional federation learning methods in many distributed scenarios and distributed learning research still faces many difficulties.For realistic distributed environments with a small number of sites,restricted parameter transfer between sites,as well as inconsistent observation features across sites,current distributed learning methods appear to be weak and lack simple and efficient means to deal with them.For this reason,we hope to establish a simpler data integration framework based on Random Forest to effectively solve the distributed learning problem in the aforementioned scenarios.In this paper,we develop a one-shot federated transfer learning method using random forests(FTRF)to improve the prediction accuracy at a target data site by leveraging information from auxiliary sites.Unlike the traditional federated strategy with multiple rounds of transmission for parameters such as gradients,FTRF adopts the model averaging idea that requires a single round of communication between the target and the auxiliary sites to ensure communication efficiency.Moreover,with the ability to assess the importance of variables in random forests,the information transfer due to FTRF can be interpreted through a similarity path between the target data and the auxiliary sites.Only fitted models from auxiliary sites are sent to the target site.Different from traditional model averaging,we combine the predicted outcomes from other sites and the original variables when estimating model averaging weights.By incorporating these augmented variables into a random forest,we obtain a variable-dependent weighting to better exploit models from auxiliary sites to improve prediction.Both theoretical and numerical results show that the proposed federated transfer learning approach is at least as accurate as the model trained on the target data alone regardless of possible data heterogeneity,which includes imbalanced and non-IID data distributions across sites and model misspecification.Six real-world data examples show that FTRF reduces the prediction error by 2-40% as compared to methods that do not make use of auxiliary information.

Keywords/Search Tags:

Federated Transfer, Model Averaging, Random Forest, Adaptive Weighting, One-shot Communication

PDF Full Text Request

Related items

1	Decentralized Vertical Federated Learning Based On Random Forest
2	Several Research On Random Forest Improvement
3	Model Privacy Preservation In Federated Learning
4	Design Of Dynamic Multi-Factor Quantization Strategy Based On Random Forest
5	Research Of Random Forest Transfer Learning Based On Instance
6	Research On Federated Learning Defense Methods Based On Attention And Isolated Forest
7	Research On Few-shot Learning And Model Light-weighting In Image Recognition
8	Research On Efficient Adaptive Training Optimization Methods For Federated Learning
9	Research On Random Forest Algorithm Based On Feature Selection And Diversity
10	A Research On Efficient Federated Learning Algorithms Based On Gradient Temporal Correlation