Font Size: a A A

Research And Implementation Of Entity Recognition And Privacy Preserving Technology For Government Affairs Text Information

Posted on:2022-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2506306602990629Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The privacy and security issues of government affairs data in the sharing process cannot be ignored,though the open sharing of government affairs data plays an important role in promoting cooperation among government departments and improving the efficiency of government work.For the following two scenarios: government departments directly share original documents and need to use shared data to collaboratively train machine learning models,this thesis designs and implements private entity recognition algorithms and privacy preserving algorithms for government affairs text data.Specifically,the main contents are as follows:According to the data type of government affairs text,different private entity recognition algorithms are designed.On the basis of whether there are obvious rules of entity expression,the private entities in government affairs text data are divided into two categories.For entities with obvious rules such as ID numbers,a rule-based private entity recognition algorithm is designed.For the irregular entities such as person name and place name,a private entity recognition algorithm based on pre-training language model and external vocabulary knowledge is designed to solve the problems such as error-prone boundaries of government private entity recognition.On the labeled government affairs text private entity recognition data set and the classical named entity recognition data set,a large number of experiments show that the above algorithm can effectively identify private entities in the government affairs text.A privacy preserving method with identity authentication function is designed for direct sharing of government data.Aiming at the government documents that need to be shared directly,the privacy of the shared text information is protected from two aspects: data encryption and transmission.In the aspect of data encryption,RSA(Rivest-Shamir-Adleman)encryption technology is used to protect the privacy entity on the basis of privacy entity identification.In order to avoid the identification error of the private entity and the disclosure of the information of the non-private entity in the text,the advanced encryption standard and RSA encryption technology are further combined to perform secondary encryption on all texts.In the aspect of data sharing,make rules to determine the privacy level of files and the authority of government departments to view different levels of files,and design a secure government data sharing system according to the mastery of keys by different government departments.Experiments on the data provided by the Science and Technology Department of Shaanxi Province show that the above algorithm can ensure the privacy and security of government affairs text data in the undistorted sharing scene.A privacy preserving algorithm with few-shot learning ability is designed for joint training scenarios.In view of the need to share government data to assist other departments in training machine learning models,a secure collaborative few-shot learning algorithm is designed with two modules: local training and parameter transmission.Based on the Model-Agnostic Meta-Learning method,a local training algorithm based on differential privacy technology is designed to solve the problem of lack of supervision data in the government affairs field.In the aspect of parameter transmission,homomorphic encryption technology is used to prevent the potential privacy security in the process of parameter sharing.Through a large number of experiments on the labeled government affairs text relation extraction data set and the classical few-shot learning data set,it is proved that the above algorithm improves the accuracy of the model on the basis of protecting the security of government data.According to the sharing requirements and data characteristics of government affairs text data,this thesis proposes a private entity recognition algorithm for government affairs text to identify key government affairs information.On this basis,different privacy data protection methods are proposed to adapt to different scenarios of government data sharing.Through a large number of experiments,it is proved that the proposed government affairs text private entity recognition algorithm and two government affairs text privacy preserving algorithms are effective in the open sharing of government affairs data.
Keywords/Search Tags:Government Affairs Text, Privacy Preserving, Private Entity Recognition, Encryption Technology, Differential Privacy
PDF Full Text Request
Related items