Font Size: a A A

Research On Data Classification And Clustering Protection Algorithm Based On Differential Privacy

Posted on:2022-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LengFull Text:PDF
GTID:2518306524998889Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The madness of big data swept,the collection and use of data has become more and more frequent.Collected data is classified and clustered,and the internal properties and laws of the collected data can be obtained.It can better provide people with applications and services.However,the data collection process will inevitably involve the user's private data.In order to solve the problem of data classification and clustering training models in machine learning that do not take into account the accuracy of model prediction and the privacy security of training data sets,under the privacy protection framework of differential privacy,summarize the privacy protection methods for machine learning algorithms in recent years,and analyze and summarize the advantages and disadvantages of various methods,and propose corresponding solutions.This paper proposes the following privacy protection algorithms for the problem of data privacy leakage in the training of classification models and clustering models research work:1.For the classification and privacy protection task of multi-category image data sets in machine learning,this paper proposes a method of combining differential privacy protection framework and the residual network with the secondary training model,using the Laplace mechanism to inject random noise into statistics of the predicted results of a model training(teacher model),migrating to the secondary model training(student model)of the classification model.Through the analysis of the privacy budget during the entire training process and relevant comparative experiments,the classification privacy protection algorithm proposed in this paper can ensure the accuracy of the training model while protecting the privacy of the data.2.Aiming at the research of clustering data analysis method k-means privacy protection clustering algorithm in machine learning,the paper proposes a method that combines the k-means improved algorithm k-means|| with differential privacy which conduct data privacy protection research from the perspective of privacy leakage in the selection of the center point during the clustering process,and improve the selection of the initial center point of the cluster.In the process of center point selection and cluster center point iterative update,the differential privacy implementation mechanism is used to protect data privacy and security.Through combing and analyzing the privacy budget in the entire clustering process,and the experiments of dynamically adjusting the privacy budget to balance the clustering accuracy and data availability prove that the cluster privacy protection algorithm proposed in this paper can ensure the accuracy of clustering while protecting the privacy of clustered data.This paper analyzes the possibility of data leakage in the process of data classification and clustering training model,combined with differential privacy which based on the assumption that the attacker has the maximum background knowledge,the differential privacy is integrated into the data classification and clustering training model.To solve the problem of data classification and clustering algorithm training model in machine learning that does not take into account the accuracy of model prediction and data privacy and security,a method which prevent privacy leakage in the process of data classification and clustering is proposed.
Keywords/Search Tags:machine learning, differential privacy, classification, clustering, Laplace mechanism, k-means
PDF Full Text Request
Related items