With the continuous innovation of algorithms and the improvement of hardware facilities,machine learning technology has made significant progress in the field of artificial intelligence,such as computer vision,natural language processing,and recommendation systems.However,the success of machine learning techniques often depends on the support of large-scale data,which provides rich samples for artificial intelligence models and improves the training effect of the models.However,in practical applications,the data scale is often limited,or important feature information is missing,which seriously affects the effectiveness of artificial intelligence models.At the same time,because the data set often contains a large amount of personal sensitive information of users(such as race,religion,political inclination,etc.),centralized training by directly sharing data in a centralized scenario may lead to serious personal privacy breaches.The core idea of federated learning is to achieve the purpose of data privacy by only conducting joint training through the interaction of model parameters in the process of model training with multiple data owners,avoiding the flow of original data,and improving the performance of machine learning models.Although federated learning has been widely used in the field of distributed machine learning,existing studies have shown that this technology still has certain data privacy and security risks.For example,in the process of model parameter and gradient sharing,malicious adversaries can reconstruct users’ local data through reconstruction attacks,thus leaking privacy information.In addition,corrupt servers may also tamper with model parameters and gradient information,and it is difficult for clients to verify their integrity.In federated learning,colluders may steal user privacy and share model parameters and gradient information to improve model performance,but this behavior can also lead to the privacy leakage of other participants.To solve the above problems,this paper deeply studies the balance between privacy protection,algorithm efficiency,and model accuracy in typical scenarios of federated learning(horizontal and vertical federated learning),and has achieved the following innovative results:(1)To address the security aggregation problem in the horizontal federated learning scenario,a decentralized federated learning method based on secret sharing and anomaly detection is proposed.This solution achieves privacy protection by secret sharing of the model parameter information to multiple parties and can resist collusive attacks under the assumption of "honest but curious" adversaries.In addition,to solve the problem of verifying the integrity of model parameters,the client shares the verification code of the local model parameters in secret and verifies the integrity by comparing the verification code of the global model parameters.Finally.anomaly detection technology is used to analyze the model parameters issued by the server and to detect and eliminate potentially corrupt servers.Compared with existing works,the advantages of this solution lie in verifying the integrity of the aggregation results,detecting the malicious tampering behavior of corrupt servers,and for the first time analyzing model parameters using anomaly detection technology.(2)To address the problem of privacy leakage in intermediate calculation results in vertical federated learning scenarios,a privacy-preserving method based on linear vertical federated learning is proposed.This method introduces a semi-trusted third party to manage the key and uses homomorphic encryption and differential privacy technology to encrypt and perturb intermediate calculation results and gradients during the two-party interaction,achieving the goal of privacy protection while ensuring modeling effectiveness.Compared with existing methods that have problems such as a lack of trusted third parties for homomorphic encryption techniques and poor modeling effectiveness for differential privacy techniques,this work has the advantage of solving the problem of difficult-to-find trusted third parties in real life,enhancing privacy protection effectiveness,and optimizing noise scale through sensitivity upper bound calculation and perturbation methods,ensuring model accuracy. |