Font Size: a A A

Research On Distributed Machine Learning Orientend Big Data Security Protection Technology

Posted on:2022-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:X P ZhaoFull Text:PDF
GTID:2518306740994229Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence and Internet technology,distributed machine learning has become a research focus in academia and IT.Distributed machine learning can greatly increase the speed of machine learning training models,thereby accelerating the speed of problem solving and greatly improving production efficiency.The parameter server is one of the most widely used frameworks in distributed machine learning.Since the training data in the parameter server framework needs to be outsourced and stored to the cloud server,it can greatly reduce the cost of local storage,but it also brings some security issues,including data integrity,data privacy issues,key escrow issues and so on.Meanwhile,on the parameter server framework,the training data need to be stored in the data server,and may cause some problems such as damge or loss of the trainind data.In order to deal with the problems of data integrity,data loss,data privacy and key escrow issues in the current parameter server framework,we study big data security protection technology for distributed machine learning,to ensure the integrity of the parameters and training data in the parameter server framework,and solve the data privacy protection,proxy signature,and key escrow problem in the process of integrity verification,so as to ensure the correctness of the training modle in distributed machine learning.The main contributions of this thesis are as follows.Firstly,aiming at the training data integrity problem in distributed machine learning,we propose a training data integrity protection scheme(DML-DIV)for distributed machine learning.First of all,DML-DIV scheme introduces a third-party auditor(TPA)to periodically verify the training data,so as to ensure the integrity of the training data stored in the data server.Second,DML-DIV scheme solves the problem of privacy protection and key escrow.On the one hand,DML-DIV scheme adopts blinding technology to ensure the privacy of training data in the process of public auditing.On the other hand,DML-DIV scheme adopts a two-step key generation scheme,to solve the key escrow problem and greatly reduce the certificate management overhead.Then,DML-DIV scheme adopts public auditing scheme to resist tampering and forgery attacks by network attackers and data servers.Finally,security analysis and performance analysis show that the DML-DIV scheme is safer and more efficient than other public audit schemes.Secondly,aiming at the problem of safe recovery of incomplete training data in distributed machine learning,we propose a training data security recovery scheme(DML-DR)for distributed machine learning.First of all,our DML-DR scheme introduces TPA to audit the training data to protect the integrity of training data.Secondly,DML-DR scheme adopts network code technology to encode the traning data and stores training data in multiple data servers,so as to achieve safely recovery of training data.That is,when training data is found to be damaged or lost in the process of public auditing,DML-DR scheme can recover the lost or damaged data block.Then,DML-DR scheme adopts blinding technology to ensure the privacy protection of training data in the process of public auditing.Meanwhile,our DML-DIV scheme adopts two-step key generation technology to solve the key escrow problem.Finally,through safety analysis and performance analysis,it is shown that the DML-DR scheme is safer and more efficient than other schemes.Finally,aming at the parameter integrity problem in the parameter server framework,we propose a parameter server framework oriented data integrity protection scheme(PS-PIV).First of all,PS-PIV scheme can guarantee the integrity of the parameters.On the one hand,PS-PIV scheme introduces the TPA timing verification mechanism,the TPA periodically verifies the parameter data in the parameter server to ensure the integrity of the parameter data stored by the parameter server.On the other hand,PS-PIV scheme introduces a real-time verification mechanism,two parties perform integrity verification after receiving parameters,so as to ensure the integrity of the parameters transmitted between the parameter server and the working node.Secondly,our PS-PIV scheme can realize privacy protection.On the one hand,blinding technology is adopted to ensure the privacy protection of parameters in the process of public auditing.On the other hand,PS-PIV scheme adopts hash algorithm to ensure the privacy protection of parameters in the process of proxy signature.Finally,safety analysis and performance analysis show that our PS-PIV scheme is safer and more efficient than other schemes.
Keywords/Search Tags:Distributed machine learning, parameter server, public auditing, integrity protection, key escrow issues, privacy protection, network coding
PDF Full Text Request
Related items