Font Size: a A A

Design And Implementation Of Unstructured Text Sensitive Information Detection System Based On Convolutional Neural Network

Posted on:2020-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:H YuFull Text:PDF
GTID:2428330572972234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the Internet,computer hardware devices and mobile hardware devices,users store a large amount of data,text,etc.in electronic text documents,and communicate and transmit at any time and any place.The use of a large number of electronic text documents poses an information security risk.Leaking sensitive information from unstructured text documents is a costly issue for individuals,businesses,and governments.How to detect sensitive information to prevent data leakage is an important issue in the field of information security.At present,the practical detection methods are roughly divided into two types,sensitive word matching and traditional machine learning methods.Both methods rely on the frequency at which feature keywords co-occur with sensitive seed words.However,in practice,this may not accurately detect more complex patterns of sensitive information.The detection method of practical application is influenced by human emotion factors,only paying attention to the appearance of words and features,separating the context of the text itself,ignoring the meaning between sentences,and only rudely following the"keyword feature contains sensitive"The principle of sensitive information detection.In recent years,some scientists have proposed using recurrent neural networks for sensitive information detection,and using the context information of documents to more accurately predict the sensitivity of documents,because its own model has the advantages of better solving the above problems.However,while the method improves the accuracy,the model training construction takes a lot of time,and the actual application may affect the efficiency.Convolutional neural network model,as a type of deep learning,can reduce the training time and improve efficiency of the model while retaining the advantages of the recurrent neural network model.This paper proposes to use Text-CNN model instead of recursive neural network model to classify text sensitive information detection into special text classification.A non-structured text sensitive information detection method based on convolutional neural network is proposed and designed.A detection system.The detection method can improve the detection accuracy of the detection model,improve the training construction time of the detection model,improve the detection efficiency as a whole,and achieve efficient and accurate detection.The main research contents of this paper are as follows:(1)Abstract the process of detecting text-sensitive information into a special two-category of texts,namely 'sensitive' and'non-sensitive'.This paper studies the existing text classification technology based on deep learning model,and combines the domain and speciality of unstructured text to find the most suitable classification method.(2)A sensitive information detection method based on convolutional neural network is proposed.More than 10,000 unstructured text documents were selected into training sets and test sets.The Text-CNN convolutional neural network model was used as the main body to adjust the model details and required hypeiparameters,and trained sensitive information detection models.Through a large number of comparative experiments,it is verified that the sensitive information detection method proposed in this paper has certain practicality;(3)Designed and implemented the USID system(Unstructered Sensitive Information Detection System)using the proposed sensitive information detection model.Functional and stress tests were performed on the system to demonstrate the high availability and stability of the system.
Keywords/Search Tags:Sensitive information, Convolution neural network, Unstructured text, Data leak prevention
PDF Full Text Request
Related items