Font Size: a A A

Research On Privacy Protection Of Training Data Based On Knowledge Distillation

Posted on:2022-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y P YinFull Text:PDF
GTID:2518306569994809Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning algorithms based on large-scale datasets have been widely used.But they also bring the risk of leaking original data,especially private information,which seriously threaten the security of individuals,companies,and the country.There is the risk of privacy leakage in the data collection,data release,model learning and model release in deep learning system.Therefore,it has important scientific research significance and application value to protect data privacy and to prevent privacy threats.Focus on the risk of privacy leakage in the model release stage,this thesis studies the privacy protection method of training data based on knowledge distillation.With the background of patient data privacy protection in smart medical applications,this thesis studies privacy protection methods of training data in single data source and multiple data source scenarios respectively.The main contents of this thesis are as follows.Research on the privacy protection method of training data of deep learning in single data source scenario.Focus on the deficiency that the existing protection methods only focus on the specific structure of deep neural network,this thesis studies a privacy protection framework of training data based on knowledge distillation.This framework uses the teacher-student framework to build a barrier between the teacher model and the student model,and does not rely on any private data when training the student model,so as to protect the training data used in building the teacher model.Focus on the specific scene of named entity recognition based on deep learning,this thesis studies a method that combines word level knowledge distillation and structure level knowledge distillation under the above privacy protection framework.This method can protect the privacy of the training data of the teacher model,and ensure that the performance of the student model is consistent with the teacher model as much as possible.The experimental results of named entity recognition on Chinese GLUE electronic medical record dataset show that this method achieves effective privacy protection of training data.Meanwhile,the F1 value of the student model only decreases by 0.67% compared with the teacher model,which has good practicability.Research on the privacy protection method of training data of deep learning in multiple data source scenario.This thesis studies a privacy protection framework of training data based on multi-teacher distillation.The framework trains teacher models independently in each data source,and then distills the knowledge from multiple teacher models into a student model with the help of public data without privacy,so as to protect the privacy of training data from each data source.Focus on the problem of heterogeneous knowledge from multiple data sources,this thesis deduces and applies the probability relationship between heterogeneous knowledge to realize the knowledge transfer from multiple teacher models to a student model,which effectively reduces the performance loss of the student model.The experimental results of named entity recognition on Chinese GLUE electronic medical record dataset show that this method achieves the privacy protection of training data from multiple data sources,and the F1 value of the student model only decreases by 0.92%,which balances privacy and performance in the scene of multiple data sources.
Keywords/Search Tags:privacy protection, training data, knowledge distillation, deep learning
PDF Full Text Request
Related items