Font Size: a A A

Privacy-Preserving Federated Learning Feature Engineering

Posted on:2023-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:X T LuFull Text:PDF
GTID:2568307052495924Subject:Engineering
Abstract/Summary:PDF Full Text Request
Large amounts of multiparty,multi-source data can significantly improve the applicability and accuracy of research results.With the increasing emphasis on data security and privacy issues and the formation of separate data silos among data owners,there is a growing need for collaborative computing sharing of cross-institutional data.Therefore,how to fully leverage the value of cross-agency data while meeting data privacy,security,and regulatory requirements has become an urgent issue to be addressed.Federated learning techniques that protect the privacy and data security have emerged.Federated learning is very suitable for federated analysis and computation of cross-agency data,which can fully utilize the value of data while ensuring privacy protection.Most of the existing federated learning schemes do not consider the importance of features for modeling,which leads to less than optimal results of the trained global models.Since data and features often determine the upper limit of the prediction results of federated learning,feature engineering is an important aspect of federated learning.Existing research solutions for feature engineering either have security problems for sharing plaintext data or have efficiency problems for feature processing on ciphertext data,and are not applicable to feature engineering processing for participants without label information.In view of these problems,this thesis conducts privacy-preserving federated learning feature engineering technique research and conducts experiments on publicly available data.The main work and contributions of this thesis will be presented in the following aspects:·This thesis presents SFEFL,a feature engineering framework based on secure multiparty computation.Compared with other frameworks,SFEFL supports collaborative feature processing by multiple parties without feature data leaving the local area,which effectively ensures the security of feature engineering in federated learning.The famous feature engineering algorithms,such as securely computing information values,Pearson correlation coefficients,and principal component analysis are better addressed by designing a new encoding method that shifts the computational workload from participants with label information to those with feature data only based on existing schemes.The security of the relevant feature engineering algorithms based on the SFEFL framework is demonstrated using a generic simulation paradigm approach.In addition,experimental results show that the new encoding approach designed in this thesis effectively reduces the running time and communication overhead of the federated learning process.·This thesis proposes a federated learning framework WoEVFL based on evidence weights.Compared with the SFEFL framework and other feature engineering frameworks that only consider feature selection,the WoEVFL framework takes into account the importance of feature replacement for federated learning modeling.In particular,WoEVFL provides a privacy computing evidence weight feature matrix method to replace the original features for model training,ensuring the original confidentiality of the training data,thus further enhancing the security of federated learning.·This thesis proposes a privacy-preserving federated learning framework based on gradient boosting decision trees(GBDT),called FedGBM.Compared to SFEFL and WoEVFL frameworks that can only handle one type of variables,FedGBM can handle especially the case when the scale of features is completely different or when binary features and continuous features exist simultaneously.Specifically,FedGBM uses LightGBM as the boosting decision tree for feature selection,based on a symmetric encryption mechanism to protect gradient privacy.Security analysis shows that FedGBM does not leak gradient privacy.In addition,FedGBM can effectively defend plaintext data from being listened to or tampered with during transmission.Experimental results show that FedGBM can be significantly more efficient than other existing GBDT-based federated learning frameworks with the same guaranteed accuracy.
Keywords/Search Tags:Federated Learning, Privacy protection, Privacy computing, Feature En-gineering
PDF Full Text Request
Related items