Font Size: a A A

Design And Implementation Of Web Public Opinion Information Filtering System

Posted on:2014-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:W ChengFull Text:PDF
GTID:2268330401965821Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the information age’s coming, it do brings us some convenience to our workand daily life, but in the meanwhile it also brings challenges. Because of the network’sattributes of freedom, there exists much reactionary, superstitious, sexy and other badinformation. The important reason why there is so much bad information is that there isexisting bad information break through the filtering software through transforming thekey word viciously in various ways, though the Web page filtering software nowadayscan dig the perfectly matched(prototype) key words.This thesis propose an anti-Chinese initiative inference way to complete theprototype keywords mining, jamming characters mining, jamming homophone mining.On this basis, We mainly complete the following tasks: To begin with, completing themodule of jamming spelling mining; In view of the lawbreaker on the network whoused to jam spelling words to transform the key words research, we design andimplement jamming spelling words mining in Chinese initiative inference environment;Secondly proposing the secondary index database technology based on the Chinesecharacters Unicode and spelling words Unicode, which promote the existing databaseretrieval technology, implement mining bad keywords in high efficiency and high speedand also improve the mining speed.Specifically, the function of jamming spelling words mining module is that forcases transforming the phrase into spelling words or transform some words in the phraseinto spelling words, supplemented by the second matching technique, we can determinewhether it is the Chinese initiative interference and then search the entire text, thuscompleting the jamming spelling words mining.This software can complete three tasks: First, input single word in real-time, toimplement prototype mining, jamming characters mining, jamming homophone mining,traditional Chinese character mining, jamming spelling words mining, and integratedmining etc. five kinds of treatment in a single text; Second, upload the key-worddictionary and documents to be processed, to complete many uploaded documents’key-word mining. Third, upload key-word dictionary and folders to be processed, classify the file in one folder according to the threshold value, pages over the thresholdvalue are all classified as poor pages.This software of strong applicability can handle plain text (*.txt) documents(*.doc), web page (*.htm and*.html) etc. files. And its interface is clear, nice, andfriendly. The test results show that the software can quickly complete prototypematching, jamming characters mining, jamming homophone mining, traditional Chinesecharacters mining and jamming spelling words mining in Chinese initiative interferenceenvironment. And it can also implement a batch of web pages’ judging. So it is worthspread and exploitation.
Keywords/Search Tags:anti-Chinese initiative interference, jamming spelling words miningmodule, secondary index database, secondary match
PDF Full Text Request
Related items