Font Size: a A A

Research On Linear Regression Algorithm For Data Privacy Preserving Based On Two-Party Computing

Posted on:2022-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:M S LiFull Text:PDF
GTID:2518306527998459Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Linear regression is a method of analyzing data to predict the future scientifically and reliably,which can help enterprises or organizations better manage and make decisions than traditional methods such as experience and intuition.Moreover,As the most classic basic algorithm in the domain of machine learning and data mining,the linear computation involved in linear regression is also very important in other learning algorithms,such as logistic regression and neural networks.Therefore,it is meaningful to study the linear regression algorithm with representative computational process.With the development of big data era and the demand for linear regression target model performance,more and more enterprises or organizations want to utilize largescale data sets for model training,but the lack of local resources cannot meet the computing requirements.At present,cloud computing services are increasingly popular as an economic solution for data and computing outsourcing based on their on-demand deployment,high flexibility,and scalability.They support instant access to shared storage and computing resources.Therefore,it is gradually adopted by many organizations.At the same time,cloud services are unreliable and have potential threat to data security.Storing data on the cloud providers raises security and privacy concerns,as cloud service providers not only can peek at the data,but also may share it with other parties.This makes it difficult to guarantee the confidentiality of the data.Therefore,there is an inevitable trend to explore the linear regression algorithms with data privacy preserving properties.A good answer to the above questions is secure multi-party computation which focuses on solving the problem of computing privacy between multiple parties and can facilitate the joint solution of a function without disclosing the private information of each party.Therefore,It can be used to solve practical problems in the real world,and related solutions based on this technology have been proposed in many fields.At present,a lot of data privacy preserving schemes for linear regression have been proposed,but there are still some challenges,i.e.,the balance of the contradiction between computational communication overhead and accuracy,which has always been worthy exploring in the domain of privacy preserving machine learning.Based on the two-party collaborative computing route of the mini-batch gradient descent algorithm under secret sharing,this subject has investigated the linear regression algorithm for data privacy preserving in the secure two-party computation situation.The main innovations are as follows:(1)A privacy-preserving linear regression scheme AHM?LR based on paillier homomorphic encryption is proposed.In view of the limited local computing and storage resources of data providers and the untrustworthiness of cloud computing platforms,this subject uses two non-colluding cloud servers to perform linear regression tasks safely under the premise of secret sharing,taking into account the data and the model parameters are held by two non-colluding cloud servers in the form of secret sharing shares,when optimizing model parameters by using the mini-batch gradient descent method,it is necessary to guarantee that the secret sharing share information of both parties will not be leaked,so the additive homomorphic multiplication calculation of secret shared values based on paillier homomorphic encryption is achieved which named the AHM protocol,then this scheme introduces the AHM protocol into the linear regression task to realize the linear regression algorithm of data privacy preserving based on two-party computation.Experiments show that the AHM?LR scheme insures that the confidentiality of data and model parameters and the prediction accuracy of the model while ensuring the efficiency of the training and prediction processes.(2)A privacy preserving linear regression scheme RPM?LR based on random data perturbation is proposed.Since the encryption and decryption operations of homomorphic encryption will bring high time overhead inevitably,so aiming at the multiplication calculation between secret shared values,this scheme jumps out of the homomorphic encryption system and avoids the encryption and decryption operations of the homomorphic encryption algorithm.Combining Bose's one-way protocol for quantification product symbol calculation,it is extended to a matrix form suitable for mini-batch gradient descent algorithm,and a linear regression algorithm for data privacy preserving based on random data perturbation is achieved.Experiments show that the RPM?LR scheme is highly optimized in the aspect of time performance than the AHM?LR scheme.The experimental validation part of this subject uses the Boston and Diabetes data sets,which are typical data sets available for regression tasks.The experimental results show that the AHM?LR scheme ensures the confidentiality of data and model parameters,while the evaluation results of the trained linear regression model are almost the same as those in the conventional mode;while the RPM?LR scheme not only realizes the data privacy preserving of the linear regression algorithm and the same evaluation result as AHM?LR,but also improves the time performance of the data privacy preserving linear regression algorithm by 99.93%,which makes the efficiency improved significantly.
Keywords/Search Tags:linear regression, secret sharing, secure two-party computation, Paillier homomorphic encryption, random data perturbation
PDF Full Text Request
Related items