Font Size: a A A

Research On Data Privacy-Preserving In Machine Learning

Posted on:2022-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:H C WangFull Text:PDF
GTID:2518306323966699Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
The Internet's exponential growth has ushered in a new age of artificial intelligence.Data has become a core driver of innovation and growth,and mastering it means unlocking the Internet's future.Machine learning technology,as an important branch of artificial intelligence,relies on the collection and training of massive amounts of data to achieve intelligent effects.However,the data more or less contains some private information.Once collected and shared,it cannot be effectively controlled,which seriously threatens the privacy of data owners.Therefore,there is an urgent need for a privacy protection machine solution so that data privacy can be effectively protected.Currently,many researchers mainly concentrate their efforts on data encryption or perturbation.Essentially,these approaches focus on machine learning algorithms operating on ciphertext data to achieve the purpose of inaccessibility of original data,thereby ensuring data privacy.In fact,even if a malicious attacker does not have direct access to the data,when obtaining the gradient model parameters during data training,they can still recover data information through reconstruction attacks,causing indirect data leakage.This dissertation proposes a secure privacy-preserving machine learning scheme in response to the privacy security issues caused by the leakage of machine learning gradients,and the main research contents are as follows:In the privacy-preserving machine learning scheme based on multi-party joint masking,this dissertation introduces three parties:certification authority,aggregation server,and participants.Each participant stores data and trains model locally,and then negotiates keys with others.After the mask is calculated,the participant convert the original model parameters and then send them to the server for aggregation.The result of aggregation of converted model parameters are the same as the original model parameter,so that the aggregation server can only calculate the parameters without knowing the actual value of each parameter.As a result,even though the aggregation server is malicious,it cannot recover the data of the participants through the gradient matching reconstruction attack,which realizes the protection of data privacy.We evaluate the accuracy and calculation time of the scheme.The experimental results show that the accuracy of the scheme still reaches more than 93%,and the calculation time is controlled within miliseconds,which has considerable practicality and Versatility.In the privacy-preserving machine learning scheme based on homomorphic encryption,this dissertation introduces the traditional Paillier encryption algorithm system,which is similar to the privacy-preserving machine learning scheme based on multi-party joint masking.The participants encrypt the local model parameters and then send them to the server for aggregation.The server relies on the additive properties of homomorphic encryption,performs the calculations on the ciphertext,and then sends the new results to the participants,and the participants update the local model after decryption.In the experiment,we mainly evaluated the influence of the key size on the encryption and decryption time.When the key length exceeds 2048 bits,security can be effectively guaranteed,and the encryption time is controlled within 20 milliseconds,and the decryption time is controlled within 5 milliseconds,which is far lower than the amount of calculation time brought by model training.Therefore,it has better practicability in certain scenarios.
Keywords/Search Tags:Privacy-Preserving, Machine Learning, Key Agreement, Homomorphic Encryption
PDF Full Text Request
Related items