Font Size: a A A

Research On Data Cleaning And Joint Learning For Privacy Protection Of Multiple Data Sources

Posted on:2020-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330602950371Subject:Information security
Abstract/Summary:PDF Full Text Request
Today,machine learning technology has been widely used in many fields,bringing considerable convenience to people's lives.One of the keys to machine learning model training is the size and quality of the data set.By expanding the data set size and covering more complete training samples,the performance of the machine learning model can be improved intuitively.Since many of the data in today's big data environment are already in the hands of different owners,the training of machine learning models across data sets has become the current trend.Cross-dataset training involves multiple data sources.Data cleaning and model training algorithms that need to be combined with multi-party data can be a key issue in multi-party joint construction of machine learning models,but data privacy issues caused by converged data sets cannot be ignored.As a security cryptographic algorithm based on multi-party participation in cryptography,secure multi-party computing technology is very suitable for use in the above scenarios.Secure multi-party computing technology can calculate the function of the participant protocol under the premise of ensuring the privacy of the participants.Techniques such as secret sharing and garbled circuits can perform basic operations such as addition,subtraction,multiplication,division,and comparison.This thesis uses secure multi-party computing technology,combined with machine learning model to construct different stages of technology,and designs a privacy-protected joint multi-data source data cleaning and model training algorithm.Firstly,this thesis designs a privacy protection cleaning algorithm for the data collection cleaning problem of multiple data sources.It improves the AVF data cleaning algorithm,and combines the secret sharing technology with the Yao garbled circuit to achieve Simultaneously ciphertext arithmetic and comparison calculation,and innovatively use sorting circuit to reduce the complexity of ciphertext sorting algorithm,mainly to solve the data privacy leakage problem that may occur when data by multiple data sources is cleaned,and finally in the public data set and manually adjusted data set,The simulation results prove the feasibility and effectiveness of the proposed algorithm.Aiming at the problem of multi-party joint model training after data cleaning is completed,this thesis designs a privacy-protected model training algorithm,which uses secret sharing technology to encrypt key parameters and implements third-party adding noise to multiparty encryption parameters.Centralized parameter processing improves the accuracy of the final model,and the uniformity and controllability of noise,and makes the final training model robust to model inversion attacks.The simulation experiments on the MNIST dataset show that the proposed scheme performs differently when selecting differential privacy noises of different scales,and proves the effectiveness of the proposed algorithm.
Keywords/Search Tags:Secure Multi-Party Computation, Differential privacy, joint training, Deep learning, Sorting network
PDF Full Text Request
Related items