Font Size: a A A

Sensitive Information Identification Based On Sentiment Analysis Of User Original Content

Posted on:2022-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:K LiangFull Text:PDF
GTID:2518306605966949Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the introduction of the Internet into thousands of households,the transmission speed of all kinds of information in the virtual space is greatly accelerated,A lot of illegal and sensitive information is flooded in the cyber space with the deliberate spread of criminals,and the detection and control of sensitive information has become increasingly important,At the same time,criminals use the inherent flaws of traditional sensitive information detection to avoid detection,which makes the traditional sensitive word detection algorithm based on string matching increasingly unable to meet the needs of this era.As deep learning shines in the field of natural language processing,sensitive information detection algorithms based on deep learning have been applied in this field by more and more researchers with their distinctive characteristics and high accuracy.Compared with the traditional method based on string matching,the sensitive information detection algorithm based on sentiment analysis makes a more accurate judgment on the overall sensitivity of the text through the analysis of the overall sentiment tendency of the text and the discovery of sensitive information.Therefore,a sensitive information detection algorithm based on sentiment analysis is proposed in this thesis.Through the analysis and mining of text data,a sensitive information detection platform is designed and implemented.The specific work content of this thesis is as follows:Aiming at the problem of reduced accuracy of traditional sensitive information detection algorithms in the face of deformed sensitive words,combined with the actual needs,This paper first explains the correlation between sensitive information detection and sentiment analysis,and then compares the text classification algorithms used in sensitive information detection.TextCNN algorithm is selected as the sensitive word detection algorithm.On this basis,this paper proposes a sensitive information detection algorithm based on sentiment analysis.Firstly,the algorithm conducts sentiment analysis on the text.The result of the use of long and short-term memory(LSTM)is a binary classification,the sentiment tendency of the text is positive or negative;At the same time,the convolutional neural network is used to sense the text.Word detection obtains the classification of sensitive words in the text,and the classification of the existence of sensitive words in the text is obtained,that is,there are positive sensitive words,negative sensitive words and no sensitive words in the text.After the two models are obtained,the new results are fused according to fully connected layer,and the final detection result is obtained.By performing data cleaning on the Weibo?Senti data set,including steps such as removing stop words and removing special symbols,and then performing text segmentation and word vectorization,a sensitive information detection model based on sentiment analysis is established,which is sensitive to traditional string-based matching.Comparing the word detection model with the sensitive information detection model based on deep learning,the experimental results show that the algorithm can detect sensitive information with an accuracy of 91%.A sensitive information detection model based on sentiment analysis,with the help of deep learning models,Web frameworks and other techniques,designed and implemented a sensitive information detection system.The system is divided into five modules,named text input module,text preprocessing module,and model Training algorithm module,core detection algorithm module and data storage module.The text input module completes the collection and sorting of the original user information in the network.The text preprocessing module completes the cleaning,vectorization and storage of the collected data.The model training algorithm module obtains two classifications by simultaneously training the sentiment classification model and the sensitive word detection model,and then integrates them to obtain the final judgment of text sensitivity.The core detection algorithm module uses the obtained model to predict the text.Data storage module saves the data generated by each module and connects each system in series.Through this system,the collected microblog user original data can be identified and the final result can be obtained,which greatly improves the security of social network platform.
Keywords/Search Tags:Sensitive Information Detection, Neural Network, Sentiment Analysis, Text Classification, Deep Learning
PDF Full Text Request
Related items