Font Size: a A A

Privacy Protection For Chinese Input Via Federated Learning

Posted on:2022-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z R LiFull Text:PDF
GTID:2518306569497554Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of machine learning has brought economic benefits to society and has had a huge impact on people's work and life.At the same time,it also makes personal privacy face challenges.The actual effect of machine learning depends on the available training data.When users enjoy intelligent services derived from machine learning technology,they often produce a large amount of raw data.In order to improve service quality,service providers will infer the relevant preferences of users based on these data.However,the process of collection,storage,and training greatly increases the risk of privacy.The malicious attackers can infer private information by analyzing users' data.As one of the important windows for users to interact with the Internet,input software helps users output a large amount of data.The privacy and security issues in it must be taken seriously.Chinese has more than 1.5 billion users,but the research on privacy protection for Chinese input is very scarce.The federated learning mechanism proposed by Google can obtain users' private contributions to train models without collecting raw data.It uses the transfer of model parameters instead of the transfer of training data,so that users' data is kept locally on the device.Unlike other privacy protection schemes,it can provide intuitive and powerful privacy for users' data.However,some studies have shown that the raw training data can be inferred through the intermediate parameters of federated learning.Recently,some researchers have conducted research on the privacy in federated learning.Existing solutions often rely on a server to perform the relevant steps.It will not work well if the server is untrusted.To solve such problems,this dissertation proposes a privacy protection technology for Chinese input via federated learning,which uses federated learning to train Chinese language models in order to balance the privacy and usability of users' input data.This dissertation presents the training method of Chinese language model based on federated learning(FedLM).In detail,FedLM can use the input data stored locally on the device to train the language model while protecting users' privacy.In the training process in FedLM,the server sends the language model to users,and each user uses historical input data to train the model and feedback the updates.Finally,the server aggregates the private contributions of each user to update the global model.Since the language model is trained locally,users' input does not need to be desensitized,which will reduce its usability.In this process,the privacy of users' input data can be effectively guaranteed.In view of the hidden dangers that the communication information in FedLM may expose user input data,this dissertation proposes a privacy-enhanced federated learning scheme with randomized response and local differential(RR-LADP)to enhance the privacy of FedLM.RR-LADP improves the traditional federated learning mechanism to solve the problem that untrusted servers and malicious attackers may infer users' input data through the intermediate parameters of FedLM.Two mechanisms are adopted in RR-LADP,namely the randomized response(RR)mechanism and the local adaptive differential privacy(LADP)mechanism.RR reduces the correlation between the final model and the contribution of a specific user by disturbing the user selection mechanism.LADP add noise to the local updates to achieve privacy.The two mechanisms are independently completed by each user,so that RR-LADP can still be effective in an environment where the server is untrusted.Experiments show that federated learning can help train a language model with good performance.In the implementation of the input prediction function,FedLM can use the users' original input data to train model locally.Therefore,its recall rate is often higher than that of the language model based on central server(CSLM).In addition,compared with the centralized differential privacy mechanism(CDP),RR-LADP can better protect the training data privacy in the federated learning process.
Keywords/Search Tags:privacy protection, federated learning, differential privacy, chinese language model, input prediction
PDF Full Text Request
Related items