Font Size: a A A

Research And Application Of Chinese Sensitive Word Filtering Technology Based On Information Source

Posted on:2015-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhangFull Text:PDF
GTID:2208330431978191Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development and popularization of mobile internet and mobile applications, the explosion of network information makes it a server test to of monitor and filter. Besides the existing monitoring systems mainly focus on network information already in force, so there is still time for sensitive information to disseminate on the Internet. Although some applications itself established an information filter (e.g.:Whip), but most of them just establish a blacklist in the database, which is inefficient to identify the deformed sensitive words and requires large storage space.According to these problems of the filtering network information, this paper research two aspects:the information preprocessing and pattern matching algorithm. Firstly, summarize the existing pretreatment methods and matching algorithms, make a comparative analysis of the three experimental multi-pattern matching algorithm, and chose the WM algorithm to complete the study of this article which is performed better. Then, analysis the requirement depending on the needs of the application, design a set method to deal with deformed sensitive words, and improve the efficiency of WM in keyword filtering by improving the parameters.The main results of this study are as follows:proposed the concept of "Filtering at the Source of Information"; enhanced the time efficiency of WM in sensitive word filtering by improving the parameters; designed and implemented the information source-filtering module; and verified the good performance of the module. The module has two parts:the section of text preprocessing, the section of matching-filtering. Text-preprocessing can revert the deformed sensitive words which are disturbed by using special characters, splitting and traditional words. While the matching-filtering part deal with sensitive information of different levels by different treatment. The module has the following functions and features:1. Processing the text and reducing the sensitive information containing special characters, split words and traditional characters automatically;2.Filtering the information quickly before it is in force;3.Disposing sensitive information differently according to the sensitivity level;4.The module is good at rate of filtration, timeliness, resistance and reusability.The achievement of this study set up a line of defense for information at the entrance of Internet, which can make the most of the sensitive information was filtered before commencement, prevent the transmission at the fundamental. In brief, it can block and filter the dysgenic information as early and much as possible, and make a great contribution to the network information quality assurance.
Keywords/Search Tags:Information Source-Filtering, Sensitive Words, Chinese, Preprocessing, Matching
PDF Full Text Request
Related items