Font Size: a A A

Research On Key Technologies Of Secure Data Processing Based On Privacy-preserving Computation

Posted on:2022-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:C FangFull Text:PDF
GTID:1488306731497944Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of new-generation information technologies such as cloud computing,Internet of Things,and Internet of Vehicles has brought the world into the era of digital economy.Various types of data have grown rapidly and are widely distributed.With the help of artificial intelligence algorithms,the value contained in the data is fully utilized to promote the digitalization and intellectualization of all walks of life.However,in the process of analyzing and processing data,a series of data security issues have also been exposed.First,once the data is released,it is out of control and users cannot prevent some attackers from inferring and using the private information in the data.Secondly,the amount of data of a single user is limited,and it is difficult to protect the privacy for data collaboration among multiple users with limited computing and communication resources.In addition,users in open networks lack mutual trust,and a trusted third party usually faces single-point-of-failure attacks,so there is still a lack of a decentralized and secure data sharing mechanism.Represented by federated learning,secure multi-party computation and trusted execution environment,privacy-preserving computation technology can realize the "availability and invisibility " of data,which provides an effective idea for solving the above problems.To this end,this dissertation takes the privacy-preserving computation as the basic tool,and studies how to process and analyze data while ensuring the privacy from three aspects: privacy-preserving data publishing,efficient and privacy-preserving data collaboration,and secure and credible data sharing,and solves the problems of efficiency,cost,reliability,etc.The main contributions and innovations are summarized as follows:(1)Privacy-preserving data publishing based on generative adversarial networkTraditional privacy-preserving data publishing methods usually need to design specific rules for each dataset to deal with private attributes,or reduce the availability of the high-dimensional data.To this end,this dissertation proposes a differential private data publishing method based on generative adversarial network.The generative adversarial network model is used to automatically learn important features of the original data,and differential private noise is added to the model gradient,so as to generate synthetic data that is similar to the original data and protects privacy,without the need to design specific desensitization rules for specific datasets.In order to improve the convergence rate of the model and the quality of the generated data,three optimization strategies,including dynamic privacy budget allocation,adaptive clipping threshold selection,and weight parameter clustering,are designed respectively.Security analysis and experimental results prove that the data synthesized by this method has high availability and strictly meets the characteristics of differential privacy.The privacy loss has nothing to do with the amount of data,so the method is suitable for the privacy-preserving publishing of large datasets,such as medical data and economic data.(2)Efficient and privacy-preserving data collaboration based on federated learningIn the case of distributed storage of user data,traditional data collaboration methods are difficult to protect data privacy effectively.This dissertation uses federated learning as the basic tool to extract feature models from the original data through distributed local training on the user side,and transforms the collaborative computing on the original data into the collaborative modeling based on feature models,thereby reducing privacy leakage and achieving data collaboration among multiple users.Specifically,this dissertation proposes two efficient and privacy-preserving data collaboration methods based on federated learning for different application scenarios.First,for bandwidth-constrained edge computing scenarios such as smart home and industrial Internet of Things,a sparse bidirectional compression algorithm is designed to filter irrelevant gradients that deviate from the global convergence trend.According to the fact that the upload bandwidth is usually less than the download bandwidth,different compression operators are used on the device side and the server side to reduce the communication overhead.In order to solve the problem of insufficient privacy protection in the original federated learning framework,a novel privacy-preserving protocol based on secret sharing and homomorphic encryption is designed,which not only protects the data privacy of a single device,but also resists collusion among some devices.Secondly,for cloud computing scenarios with high latency,such as interest recommendation and collaborating marketing,an efficient federated training strategy is proposed,which effectively improves the training efficiency by increasing the amount of local computation,selective sharing of parameters,and dynamic selection of users.At the same time,in view of the high computational overhead of existing privacy-preserving protocols,a lightweight privacy-preserving protocol is designed,which can achieve strong privacy protection with less communication rounds and computational overhead,and resist semi-honest users and servers.Security analysis and experimental results prove that the two efficient and privacy-preserving data collaboration methods strictly protect the data privacy of distributed users,and are superior to existing methods in terms of data availability,computational overhead,communication overhead,and training efficiency.(3)Secure and credible data sharing based on blockchain and federated learningData sharing can effectively improve resource utilization,but it faces problems such as privacy leakage,lack of trust among users,and single point of failure of the central server.This dissertation combines the blockchain with federated learning to establish a decentralized and credible data sharing framework,which is suitable for scenarios such as the Internet of Vehicles and mobile communication networks.The “availability and invisibility” of data is realized by sharing the data model instead of the original data,and the data transaction process is recorded by the blockchain in a transparent and immutable way to realize the verifiability,traceability and auditability of the whole process of data sharing.Specifically,this dissertation proposes two secure and credible data sharing methods based on blockchain and federated learning for different attack backgrounds.First,in view of the poisoning attacks of some malicious users,a gradient verification and incentive mechanism is designed to ensure the availability of the model and encourage reliable users to share high-quality data.Regarding the problem that the transparency of data on the chain violates the requirements of data privacy,an adaptive differential privacy mechanism is proposed to provide strong privacy protection with little data utility loss.Secondly,in view of the tampering attacks of some malicious miners,global gradient verification based on homomorphic commitment is incorporated into the consensus protocol to ensure the correctness of the federated learning model in each round of training.For the problem that the differential privacy mechanism will reduce some data utility,a secure aggregation protocol based on gradient masking and verifiable secret sharing is designed.Even if some users drop out during the training,the privacy of other users' data can still be protected.Security analysis and experimental results prove that the two secure and credible data sharing methods can motivate reliable users to participate in the data sharing,and can resist insecure factors such as poisoning attacks,tampering attacks,and users dropping out,respectively.
Keywords/Search Tags:Privacy-preserving computation, Federated learning, Blockchain, Data collaboration, Data sharing
PDF Full Text Request
Related items