| Logistic regression is a typical machine learning algorithm,which is widely used in financial forecasting,medical pre-diagnosis,recommendation system and so on.Building a reliable logistic regression model relies on large-scale data samples.However,in reality,data are usually stored separately and involve sensitive information of participants.The centralized framework that is directly uploaded to the cloud server will lead to serious privacy leakage.Therefore,the security construction of logistic regression model has attracted close attention from researchers.Although there have been many research advances in privacy-preserving logistic regression,there are still some disadvantages:1)training models that require multiple rounds of user-server interaction introduce high computational costs and communication burdens;2)schemes for horizontal partitioning are mostly hard to scale to vertical partitioning and require high-frequency communication among the participants;3)non-interactive schemes based on dual-server architecture require multiple rounds of server multiple rounds of interaction,and the reliability of the system relies too much on the network configuration.To overcome the above challenges,the goal of this work is to construct non-interactive privacy-preserving logistic regression schemes for vertically partitioned data.On this basis,a privacy-preserving online medical pre-diagnosis system based on logistic regression was designed and implemented.The main work and contributions are as follows:1.Firstly,in order to avoid multi-round interaction between users and cloud servers,a non-interactive and privacy-preserving logistic regression model(SLRT)for horizontal partition is proposed.Combined with the approximate replacement strategy of the loss function,a new gradient calculation form is constructed to realize the separation structure of the sample data and the model parameters in the mathematical expression,so as to ensure that the user preprocesses the local data according to the separation form and uploads it to the cloud server by secret sharing.At the same time,the security analysis and numerical experiments of the protocol are provided to ensure user privacy and high precision of the model.Compared with the non-interactive scheme VANE,the efficiency is improved by at least10~2 times.2.By extending the SLRT model for horizontal partitions,and a privacy-preserving logistic regression model(VPPLR)for vertical partition data sets is proposed.By decoupling the hybrid computing between user data and model parameters,the purpose of separating local and cloud computing tasks is achieved.In the preprocessing stage,the sample vector is filled to generate the local data matrix,and the vectorization method is introduced in the training stage to ensure that the global parameter update is completed in the plaintext domain,which is superior to the interactive method based on homomorphic encryption in efficiency.Security analysis proves that the scheme can preserve data and model privacy at the same time.Numerical experiments show that the scheme has high efficiency and high precision.3.In order to solve the problem that the traditional non-interactive scheme based on dual-server architecture requires multi-round communication between cloud servers,a non-interactive logistic regression model based on function encryption(PPNLR)is designed based on the VPPLR model for vertical partition.The private data is preserved by defining the‘sample-feature dimension encryption’strategy.Based on blinding and function encryption,the model training is completed on a single server without interacting with other participants.In application,a privacy-preserving online medical pre-diagnosis system is designed and implemented based on PPNLR,which proves its practicability in real deployment. |