Font Size: a A A

Research On Privacy Preserving Record Linkage In Data Integration

Posted on:2024-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:S Y YaoFull Text:PDF
GTID:2568307103473534Subject:Network and information security
Abstract/Summary:PDF Full Text Request
As big data becomes an increasingly strategic resource,the existing imperfect data integration system has become an obstacle to the high-quality development of big data.Among them,the contradiction between privacy security and sharing is particularly prominent.Traditional data integration methods generally require the integration of data from different sources,but this approach can easily lead to problems such as data leakage and privacy infringement,posing higher requirements for privacy protection.On this basis,the academic community has proposed privacy preserving record linkage(PPRL)technology to solve the problem of privacy and security in cross-domain,crossdepartment,and cross-industry data integration.However,existing PPRL technologies still have shortcomings in terms of linkage quality and time performance.Traditional PPRL algorithms require a large amount of computing resources and storage space when dealing with large-scale data,leading to low time performance.At the same time,traditional PPRL algorithms are prone to link errors when dealing with high-noise data,resulting in low linkage quality.To address these issues,two improved privacy preserving record linkage algorithms are proposed.The dissertation contents and contributions are shown as follows:(1)To optimize the linkage quality in privacy preserving record linkage,a PPRL algorithm based on a siamese neural network is proposed.The algorithm is designed based on the difference in sensitive attribute categories and uses a hybrid bloom filter encoding method to enhance security in the encoding process while achieving anonymization.In addition,the algorithm constructs a record matching model based on a siamese neural network to improve the accuracy of entity matching.Experimental results show that the privacy preserving record linkage algorithm based on the siamese neural network has higher linkage quality,better scene adaptability,and robustness.(2)To optimize the time performance of privacy preserving record linkage,a PPRL algorithm based on fuzzy commitment is proposed.The algorithm mainly consists of two phases: the record registration phase and the matching verification phase.For the decomposed two phases,the algorithm combines the hybrid bloom filter encoding with the random matrix strategy to improve security and shorten execution time.Experimental results show that the privacy preserving record linkage algorithm based on fuzzy commitment has better execution time and performance than the baseline algorithm.In addition,the algorithm has good practical research significance as its performance does not show any significant downward trends with the increase of the sample size.
Keywords/Search Tags:Data integration, Privacy preserving record linkage, Hybrid bloom filter, Siamese neural networks, Fuzzy commitment
PDF Full Text Request
Related items