Font Size: a A A

A Method Of Document Sensitivity Calculation Based On Semantic Dependency Analysis

Posted on:2019-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:G LuFull Text:PDF
GTID:2428330542498720Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important part of the Internet,the security and stability of the online community have drawn wide attention.The spread of some sensitive information in the online community has caused great harm to the social harmony.Sensitive information identification methods can identify sensitive information and provide early warning to prevent its spread.These methods are of great significance to guide the trend of online public opinions and create a good public opinion environment for online communities.When recognizing the sensitive information of the online community,the recognition efficiency of the existing methods still needs to be improved due to the inability to understand the meaning expressed in the corpus.Based on the research of the sensitive information recognition technology at home and abroad,combined with the semantic analysis and calculation techniques,this paper designs and implements a document-sensitive computing method based on semantic dependencies.This method expands the sensitive dictionary and enhances the recognition effect of sensitive words by sensitive sequence annotation,and calculates the sensitivities of sentence level and document level respectively.Semantic dependency analysis and local sensitivity transfer algorithm are used to process sentences and extract sentence-level sensitive vectors.When dealing with the whole document,the sensitive sentence structure matching strategy and sensitive staging strategy are used to grade the sensitivity of the document,which provides a reference for the public opinion monitor.In order to verify the accuracy of the algorithm,this paper designed a comparative experiment on sensitive documents at all levels.During the experiment,several common classifiers such as NB,SVM,KNN and adaboost were used to realize the sensitivity classification of documents.Experimental results show that the sensitivity of the proposed algorithm to experimental data sets is 84.51%,which is about 10%higher than that of the control algorithm.The algorithm in this paper has a more balanced identification of documents at all levels.
Keywords/Search Tags:document sensitivity, semantic role, semantic dependency, sensitive vector, classifier
PDF Full Text Request
Related items