Font Size: a A A

Research On Privacy Information Detection System In Data Opening

Posted on:2020-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y X HanFull Text:PDF
GTID:2438330575951399Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of big data,the total amount of global data will reach 44ZB in the future.Everything in people's daily life will be related to the data.The government is undoubtedly the largest data owner,and the government's data disclosure will play a vital role in the development of the entire society.At present,the disclosure of government data in China is gradually under progress.However,there are several difficulties in the progress of data disclosure.One of the key issues is how to protect personal privacy.It is an urgent problem to strike a balance between data disclosure and privacy protection.The public data contains a large amount of unstructured data,which has no obvious identification features.The conventional detection method is difficult to find the private information contained in it.In most cases,the private information detection in the data still needs to be done manually.Based on the above situation,the author designed a model that can automatically detect entity type private information from unstructured data.The model first uses a deep learning-based text classification model to distinguish the part of the text to be detected that contains private information from the part that does not contain private information,and then identifies the entity information contained in the part that contains private information.Finally by calculating the weight of the entity information,the private information will be distinguished.Experiments show that the model can effectively replace the traditional manual detection and labeling methods.In the text classification phase,this paper collects government-disclosed data through the network,establishes relevant data sets,designs a classification model of private information,and verifies the feasibility of the model using the established data set.Experiments show that the model can automatically detect the private information contained in the published data set,and basically achieve the purpose of replacing manual detection.The function of the detection model can be further enriched,and the accuracy can be further improved.In order to automatically detect of private information in unstructured data,this paper uses a deep learning-based approach to understand the data that needs to be detected.In the process of constructing the detection model,this paper proposes to use the Re-ranking strategy-based named entity recognition method to tag the entity type private information existing in the data,so that the recognition result is easier to be understood.The model can automatically detect unstructured data published by the government and provide technical support for data disclosure.
Keywords/Search Tags:Privacy information detection, unstructured data, deep learning, named entity recognition, natural language processing
PDF Full Text Request
Related items