Font Size: a A A

Study Of Text Filtering Based On Fuzzy Text Restoration

Posted on:2021-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:C M XuFull Text:PDF
GTID:2518306032465044Subject:Information Science
Abstract/Summary:PDF Full Text Request
As the rapid development of the Internet has created good conditions for the timely sharing of information,the information in the network has boomed exponentially.It is hard to ignore the fact that the development of the Internet is a double-edged sword.On the one hand,a wealth of information is pouring into the Internet,which allows users to get the information they need more efficiently and easily.On the other hand,this characteristic of the network is also used to spread the reactionary and the pornographic information which is harmful to the network environment and will have the adverse effect to the stability of the society and the people's life.And this effect is particularly pronounced among teenagers.Therefore,it is an urgent issue in the construction of network security to purify the network environment and filtrate bad information in the network.As information in the network exists in a variety of forms,mainly in textual form,screening bad text is an important component of bad information filtering.At present,there are two ways to filter the bad text.One is to classify the text into normal and bad text by the idea of text classification,and then filter the bad text.The second is to match the words in the text and the collected sensitive word list.Based on the advantages of both,a filtering method for bad text is designed in this paper.The main work of this paper includes:(1)The definition of fuzzy text is given.Bad text contains a variety of bad words.In order to spread bad text in the network,criminals usually blur the bad text before sending that.Through the statistics and analysis of a large number of corpus,we give the definition and judgment scheme of fuzzy text in multiple dimensions.(2)The restoration method of fuzzy glyph in fuzzy text is designed.Through the analysis of the bad text,several common fuzzy situations of bad text in the glyph are summed up.According to its different fuzzy situation in the glyph,a lot of data related to glyph is collected,and different restoration methods for different fuzzy situation are designed with the actual needs.(3)The machine translation model is applied to the restoration scheme of Pinyin transliteration.Existing Piny in transliteration cannot convert longer text into Piny in correctly.In this paper,Pinyin transliteration is regarded as a machine translation task,and the Sequence to Sequence model in machine translation is used to solve this problem,which can better convert Pinyin into correct Chinese characters.(4)The fuzzy text restoration scheme proposed in this paper is used to identify bad text.We crawled the data containing bad text from Weibo and marked it,then compared the recognition scheme proposed in this paper with the existing text auditing API.The experimental results show that the accuracy of bad text recognition after restoration is higher than that without restoration.
Keywords/Search Tags:text filtering, fuzzy text, text restoration, bad text
PDF Full Text Request
Related items