Font Size: a A A

Research On Privacy-preserving Data Computation In Machine Learning

Posted on:2022-04-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H WuFull Text:PDF
GTID:1488306530992869Subject:Computational intelligence and information processing
Abstract/Summary:PDF Full Text Request
With the development of cloud computing and distributed computing,machine learning algorithms such as data mining and deep learning have been able to combine the advantages of big data to train more accurate models.Therefore,cloud outsourcing learning and federated learning have become two popular big data machine learning paradigms.However,in the process of cloud outsourcing machine learning and federated learning,data owners are required to provide original data or to share training parameters of local data.The provision of these information has the risk of privacy leakage.For example,big data collected from medical care and finance may leak private data,including basic personal information,patient medical records,economic information.Once the data is disclosed,personal life and property will be seriously threatened.Therefore,it is of great significance to study the privacy preserving big data computing methods in cloud outsourcing machine learning and federated learning.This dissertation studies the privacy and security computing issues of multi-data owner joint association rule mining in cloud computing environment and multi-client federated deep learning in a distributed method.This dissertation analyzes various existing attack methods and designs secure data encryption algorithms.According to the respective characteristics of joint association rule mining and federated learning,the specific privacy preserving data computing schemes are designed for encrypted distinct types of data.In the two machine learning methods,the original data submitted by the data owner/client to the cloud server are encrypted data.The cloud server calculates the encrypted data and returns the prediction result in encrypted form to the data owner/client.This dissertation proves the applicability of the proposed machine learning models and evaluates their performance.Experimental results show that the schemes proposed in the dissertation can provide accurate privacy protection association rule mining and deep learning classification.The main research results of this dissertation are as follows:(1)A database fuzzy method is designed to ensure a efficient privacy-preserving data mining.In order to ensure the accuracy of data mining and deep learning,the privacy protection calculation in this dissertation is carried out on ciphertext data.Considering the huge amount of data in joint database,whether encrypting the big data or calculating the encrypted big data,the computational burden and storage consumption will be very large.Therefore,the dataset in the data mining should not be directly encrypted,but a database fuzzy method is employed to insert virtual transactions into the database to confuse adversaries.This can ensure the privacy and security of the database.At the same time,in order to ensure the availability of the fuzzed dataset(i.e.,enabling data mining calculation on the fuzzed dataset),each transaction in the dataset needs to be marked with a tag.For real transaction,the tag is set to 1,otherwise it is set to 0.In this way,the result of data mining is determined by the result of fuzzed dataset and the tags.(2)A multi key homomorphic encryption mechanism is designed to prevent the attack of multiple partners in the process of data mining and deep learning.Specifically,in this dissertation,a secret key partition method is designed based on multi-party negotiation,and two homomorphic encryption algorithms underlying with the secret key partition method are constructed.They are symmetric homomorphic encryption algorithm based on exponential multiplication and additive homomorphic encryption algorithm based on elliptic curve(EC-AHE).In the dissertation,the symmetric homomorphic encryption algorithm based on exponential multiplication is used to encrypt the tags of the database to be mined,and a security comparison algorithm is designed for data mining,so that the multi-party joint mining can be carried out correctly.Moreover,a privacy protection mechanism is designed using EC-AHE to cover up the local gradient of each client,which makes it difficult for malicious adversaries and cloud server to infer the original information of the dataset.The proposed privacy preserving mechanism can keep the high prediction accuracy of the training model and balance the security and efficiency.(3)A data homomorphism verification scheme is designed to prevent global parameters from being tampered by malicious adversaries/cloud servers.This mechanism allows distributed clients to verify whether the aggregate ciphertext obtained from the cloud server is the fusion of local data ciphertexts of all federated clients.Therefore,the homomorphism verification method needs to be able to verify the homomorphism calculation.Specifically,this dissertation designs a homomorphic hash function based on elliptic curve,which can convert any length of input data into a fixed length of output digest,and the function satisfies the homomorphic property,that is,the calculation operation of input data can be directly transferred to the calculation operation of the digest of the input data.At the same time,because of the anti-collision property of hash function,the adversaries/cloud servers cannot guess the output of the original function according to the digest.It is also impossible to tamper with the input and digest without secret keys.Therefore,the verification mechanism based on the homomorphic hash function of elliptic curve can be used to verify whether the global parameters(i.e.aggregation gradient)from the adversary/cloud server are correct.(4)A fast synchronous stochastic gradient descent(F-SSGD)method is designed to ensure fast training and training convergence of the federated learning model in the case of multi client heterogeneity.This method can ensure that the federated learning model can converge quickly when the computing power of each client is different or the distribution of client dataset is inconsistent.Specifically,in F-SSGD,a time period is set during which the client with stronger computing power can continue to calculate multiple local gradients without waiting for the client with weaker computing power;the client with weaker computing power can weight their multiple gradient copies to ensure training convergence and prevent the final model from bias to the client with the fastest computing speed.After reaching the set time period,all clients submit their local aggregation gradients to the cloud for next model updating.It is proved experimentally and theoretically that the F-SSGD method can guarantee the convergence of the federated model,and the convergence rate is O(1/M),where M is the number of iterations.
Keywords/Search Tags:Data mining, Federated optimization, Federated learning, Cloud computing, Privacy protection
PDF Full Text Request
Related items