Font Size: a A A

The Study And Application Of The Identification Method For The Change Form Of Chinese Sensitive Words

Posted on:2019-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:C FuFull Text:PDF
GTID:2428330545957136Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet technology,people can get information of current politics,economy,entertainment and life through the network anytime and anywhere,and can also quickly and conveniently publish the message on the network.While users enjoy the convenience of the Internet,some malicious users will release some bad sensitive information for their own benefit,such as including violent pornography,political sensitivity,ethnic discrimination,social stability and network telecom fraud.If these sensitive information does not get accurate discovery and timely processing,will let the outlaws to swoop in,extremely unfavorable to society and the country's long-term stability and healthy development.In order to purify and supervise our network environment,we need to identify and deal with the text which contains sensitive information in the network.At present,most of the research on sensitive word recognition methods is to compare the existing sensitive thesaurus and the text to be detected,although the correct rate of the sensitive words used in the text is very high,but the method is too simple.Because many malicious publishers in recent years in order to avoid the network platform of Censorship,the text of the sensitive information in the deformation processing,making the network platform can not identify its true meaning.Therefore,we urgently need to study a variety of sensitive words to identify the shape of the method.By analyzing the structure and pronunciation of Chinese characters,this paper proposes a method of recognition of the change form of Chinese sensitive words.This method has designed sensitive word recognition algorithm based on the grouping of confusing pinyin,String abbreviation recognition algorithm and recognition algorithm based on KMP's character split recognition algorithm for the pinyin of word,the abbreviation of word and the split of word,and improve the accuracy and efficiency of the review.The experimental results show that the proposed method has higher recall and precision when recognizing the change form of Chinese sensitive words.In the process of identifying the sensitive words in the text,it is necessary to examine the text manually and filter out the more sensitive text,but it takes a lot of human and material resources to realize the process.For this reason,this paper takes into account the category,frequency and position of sensitive words and the changes of sensitive words in the special period,based on the automatic recognition algorithm of the sensitive words,which is proposed above,so as to calculate the sensitivity of the text.Finally,based on the calculated text sensitivity to automatically review the text,this method can effectively reduce the Web page audit workload,improve the filtering efficiency of sensitive text.
Keywords/Search Tags:Change form, Sensitive Word Recognition, Edit Distance, BM Algorithm
PDF Full Text Request
Related items