In recent years,data analysis based on machine learning has achieved unprecedented success in many fields,and the industrial value of big data has been fully exploited and utilized.But the growth rate of computer hardware capacity is far less than the explosive growth rate of data faced by machine learning.So,local or terminal data are migrated to the cloud server.However,the "data island" effect and the high risks with data privacy limit the development of a centralized learning mode.On the contrary,collaborative machine learning allows local nodes to train model tasks without sharing data,which is widely used in medical,financial,enterprise products,and other practical applications.It should be noticed that,the collaborative learning system faces many security threats and privacy disclosure risks.First,cloud server,as the key part of the collaborative learning system,faces traditional attack threats.Second,the intermediate result of interaction between local nodes and a cloud server is widely exploited by attackers to steal user data privacy.Third,the existence of untrusted third parties in practical applications aggravates the security risks.In this context,the study on data security and privacy protection of collaborative learning is a fundamental issue for the development of machine learning and a core issue for the sustainable development of the big data industry.This thesis studies multi-party collaborative learning differential privacy protection technology in the cloud environment to ensure data privacy security,and further improving system performance and model quality,which successively discusses differential privacy protection schemes in the "local-cloud" and "edge-cloud" scenarios from three aspects: data training iterative optimization process privacy protection,budget allocation for high-dimensional data privacy protection,and incentive mechanism design under privacy metrics.This thesis presents system model,security threat analysis,design objectives,scheme design,theoretical analysis,and performance evaluation.The main research contents and contributions of this thesis are summarized as follows:(1)To protect intermediate results in the training iteration process,this thesis proposes a distributed differential privacy chaotic quantum particle swarm optimization algorithm.Existing stochastic gradient descent(SGD)algorithms have the problems of the difficulty in calculating the target gradient and the prone of the optimization target to fall into local optimization.The proposed algorithm optimizes the training iteration process and protects the privacy of intermediate results in the "local-cloud" collaborative learning scenario.The local model is trained by adaptive chaotic quantum particle swarm optimization.Local nodes do not need to directly submit the local data to the cloud server but submit the local model updates under differential privacy protection,which ensures that the system can effectively resist differential attacks and inference attacks.To evaluate the feasibility of the algorithm,the thesis further provides a theoretical analysis of the security and convergence of the algorithm and verifies the performance of the algorithm through experiments.The experimental results show that the proposed algorithm can achieve an effective balance between privacy protection intensity and model optimization.(2)To optimize the budget allocation of data privacy protection in collaborative learning,this thesis proposes an adaptive privacy budget allocation algorithm based on feature differences.Existing algorithms for high-dimensional data differential privacy protection have the problem of excessive consumption of privacy budget.Then,a privacy protection scheme is further proposed for high-dimensional data in the whole life cycle under the "edge-cloud" collaborative learning scenario.Theoretical analysis shows that the algorithm can ensure the system to effectively resist differential attacks,inference attacks,and tampering attacks.The experimental results show that this algorithm can not only meet the requirements of privacy protection but also has advantages in the accuracy of model under the same privacy protection intensity.(3)To solve the data privacy preserving and income fairness for high-quality participants,this thesis proposes a Stackelberg game incentive mechanism model based on data quality pricing.Existing incentive mechanisms ignore the privacy loss of participants.The proposed mechanism encourages more edge nodes with high-quality data to participate in the "edge-cloud" collaborative learning scenario,which also meets differential privacy data protection.The cloud server,as the task publisher,comprehensively considers the resource contribution and privacy loss of the participants and gives corresponding rewards.Theoretical analysis and experimental results show that the game process,which takes data privacy protection as the goal and considers privacy metrics,can reach a Nash equilibrium.The proposed mechanism not only improves the security and revenue fairness of participants but also improves the service quality of cloud services.These are beneficial to the safe and sustainable development of collaborative learning. |