Machine learning is a popular and practical way to predict and classify data.It is obviously helpful for medical,epidemic investigation and other fields.However,patients do not want to provide sensitive data such as electronic medical records to third-party institutions for investigation,analysis and training of models,although these sensitive data actually have strong research value.We call such problems “Data Island”.Federated Learning(FL)is a class of technologies that enables decentralized data sources to work together to train neural network models without sharing raw data.Horizontal FL refers to the training method when solving the data sources that the features have high coincidence,but the samples have low coincidence.This topic belongs to the research of horizontal FL.The classical method of horizontal federated learning lets the data holder to train model on the local,then upload the local gradient of each iteration round to the central server.However,the local gradient sharing will make the local data vulnerable to attack.The aggregation server also may return incorrect results to the client due to unexpected errors or deliberate actions of the server.Therefore,federated learning of verifiable privacy protection is a prospective research orientation.This work further discusses the public verifiability.Compared with centralized verifiability,public verifiability has high application value,as it allows any honest verifier to perform verification,even if the verifier does not participate in the protocol or know any privacy information.The existing work has problems such as insufficient privacy protection,difficult to resist malicious adversaries,and interactions to overcome dropout users.The purpose of this work is to discuss,classify these problems,and give two solutions under different scennarios,which can solve the above problems from different angles,covering the following research:· This work proposes a scheme NIPVS-FL to ensure the privacy and correctness of aggregation results under malicious server Settings.The two servers architecture setting is adopted,which can ensure user’s data privacy under malicious servers and non-collusion setting.It also ensures the publicly verifiability of the results and,in the event of a error with the results,traces back to which server miscalculated and caused the wrong result.The threshold Paillier Cryptosystem is adopted to protect privacy,the correct exponentiation proof algorithm is adopted to ensure verifiability and the linear homomorphic hash is adopted to verify the aggregation result.The comprehensive protection of privacy and calculation results is realized.· Single-server and dual-server architectures have the risk of single point of failure.The distributed system architecture is required when resources are sufficient or reliability is requested.Therefore,the scheme NID-FL is proposed.The byzantine consensus algorithm is used to ensure servers to perform correctly.The data secure aggregation is realized by paring mask.By combining the two,a non-interactive security aggregation scheme between the user and the server is realized,which minimizes the impact of network delay.The scheme is further optimized by adding the spare key mechanism which is convenient for the dropped users to quickly return to participate in the aggregation.Both schemes in this work have the characteristics of non-interactive secure aggregation,verifiability,privacy protection against malicious adversaries,and perform well in computation and communication overheads,suitable for complex and variable network environments in reality. |