Font Size: a A A

Research And Implementation Of Security Classification And Privacy Data Recognition Algorithm On Government Data

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiFull Text:PDF
GTID:2506306050464754Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the e-government,governments at all levels have established their own information release platforms,which can improve the utilization and the sharing situation of government data.In these platforms,the method of privacy data recognition and data security classification is still manually specified,which is inefficiency.Natural language processing algorithms based on deep learning method have been proposed in recent years,and have achieved great results in various fields.Therefore,this paper focus on the characteristics of Chinese text data involved in government data.It also uses the deep learning methods used to improve the efficiency and accuracy of the privacy data recognition and data security classification based on the existing algorithms.The main results of the paper are as follows.(1)Aiming at the problem of private data recognition,a private data recognition model based on rules and named entity recognition is designed in this paper,which are used to identity private data items with and without explicit rules in the text.On the basis of the current general named entity recognition algorithm,the attention mechanism is introduced to extract the global information of the text,so as to improve the accuracy of the model.In addition,the bidirectional gated recurrent Unit is used to replace the bidirectional long short-term memory in the baseline,which can reduce the parameters in the model and shorten the training time of the model.Experimental results show that the proposed algorithm can effectively solve the problem of privacy data recognition.(2)Aiming at the problem of data security classification,a data security classification model based on information entropy is proposed in this paper.The input of text classification model is the corresponding word representation in most existing method,which is a less effective feature extraction method for the data security classification problem.The privacy measurement method based average self-information and mutual-information is used in this paper.It also uses Doc2 Vec technology to obtain each segment of text features.Then two parts of the features is fed in classification model for data security classification.The experimental results show that the fusion features for classification is better than the word representation.The private data recognition and data security classification model designed in this paper can set different privacy evaluation items according to different application environments.It also considers the impact of the overall semantic information of the text in the data security classification process.This model can improve the efficiency and accuracy of privacy data recognition and data security classification for government text data.It is a key link in the openness and sharing of government data and can reduce the degree of privacy disclosure to some extent.
Keywords/Search Tags:Private Data Recognition, Data Security Classification, Deep Learning, Openness and Sharing, Government Data
PDF Full Text Request
Related items