Font Size: a A A

Research On Privacy-preserving Collaborative Machine Learning

Posted on:2024-01-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:L SongFull Text:PDF
GTID:1528306941998679Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information and the popularity of network edge devices,data is generated in a multi-point distributed manner and has shown explosive growth trends,thus collaborative machine learning has a broader development prospect.In collaborative machine learning,multiple participants train their local model on private data and exchange model information or intermediate calculation results to learn a global shared model.Collaborative machine learning has become a new paradigm for solving the problem of data islands and establishing data collaboration.However,some studies have demonstrated that sensitive user information can still be inferred by analyzing messages passed between participants,therefore collaborative machine learning still faces the risk of privacy leakage.To solve the problem of privacy issue in existing collaborative machine learning,this dissertation conducts research according to different distribution forms of data and proposes privacypreserving collaborative machine learning methods,which takes into account model utility,computational efficiency,and system overhead while protecting user privacy.The main contributions of this dissertation are summarized as follows:Firstly,in the scenario where data is horizontally distributed,the model parameters uploaded by clients will leak user privacy during the model aggregation process in federated learning.To solve this problem,a secure federated aggregation method based on knowledge distillation and shuffle model is proposed.Clients upload prediction labels on the public dataset instead of uploading local model parameters,avoiding the server from inferring user sensitive information according to model parameters uploaded by the client.In addition,the proposed method extends existing federated learning to encoding,shuffling,and analysis security architectures by introducing local differential privacy and a shuffle model,making the prediction labels by the client anonymous.Theoretical derivation proves that the proposed method satisfies the differential privacy and gives an error boundary.The experimental results show that the shuffle model can provide privacy amplification and improve communication efficiency while maintaining the performance of the model.Secondly,in the scenario where large-scale graph data horizontal distribution,the existing methods based on perturbing model parameters in federated learning are difficult of balancing privacy protection and model utility.To solve this problem,a graph-based federated learning method based with low dimensional spatial perturbating is proposed.The proposed method freezes the graph embedding layer to reserve the feature extraction process locally and to reduce communication overhead.Additionally,the model parameters are mapped to a low-dimensional space before adding perturbation,thus the scale of the required noise is reduced.The proposed method can improve the model accuracy under the same privacy protection level by alleviating model perturbation.Experiments are conducted on social network datasets,and the results show that the proposed method takes into account model utility,privacy protection,and communication efficiency.Thirdly,in the scenario where data is distributed vertically,existing collaborative logistic regression methods cannot be secure against semi-honest participants.To solve this problem,two privacy-preserving collaborative logistic regression methods with two participants and multiple participants are proposed.The proposed methods include a privacy-preserving training process and a privacy-preserving prediction process.The privacy-preserving collaborative logistic regression with two participants ensures that both participants complete gradient descent and model update under ciphertext by applying homomorphic encryption.In the privacypreserving collaborative logistic regression with multiple participants,the message is transmitted through a global server,secure multi-party computing and additive homomorphic encryption is used to achieve secure computing.The prediction under the ciphertext is realized to ensure that the model deployer cannot obtain user data.Experiments were conducted on multiple datasets and security and performance analyses were conducted.The results showed that both proposed collaborative logistic regression methods are secure against semi-honest participants,and can protect privacy while ensuring model accuracy.Finally,in the scenario where data is unevenly distributed,traditional transfer learning has the problem of mutual exposure of source data and target domain data.To solve this problem,a privacy-preserving collaborative unsupervised transfer learning method is proposed.The proposed method changes the centralized transfer learning architecture into a federated learning architecture by introducing a global server.The source domain and the target domain are trained locally as clients,they communicate with the global server to complete model training and knowledge transfer.In addition,homomorphic encryption is used to ensure the security of transmission and computing,therefore ciphertext messages can be transmitted between the source domain,target domain,and server,and model updates and domain adaptation can be performed under ciphertext.The security analysis of the proposed method proves that the source domain and the target domain cannot obtain each other’s private information during the collaboration process,at the same time,it is secure against a semi-honest server and a malicious third party.Experimental results show that the proposed method can realize secure knowledge transfer without loss of accuracy.The research results of this dissertation on the privacy-preserving collaborative machine learning methods will be beneficial for further implementing privacy computing in the mode of"data is available but not visible".While satisfying user privacy protection,data security,and government regulations,it will lay a solid foundation for solving data isolation problems and achieving cross-organizational data cooperation,providing critical support for building a new future of collaborative interconnection and data circulation.
Keywords/Search Tags:Collaborative machine learning, Privacy-preserving, Homomorphic encryption, Differential privacy
PDF Full Text Request
Related items