Font Size: a A A

A Bad Text Filtering Method

Posted on:2013-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:2248330374986241Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid developing of Internet, the human enter into new era which has richinformation.But there is a new problem in this condition at the same time,in order oflawless purpose in politics or economy,many people distribute reactionary,pornographic,defrauding and violence information by use of internet,it is not good at the con-struction of statle society,value-based translation,especially the healthy growth of theyourth.How to purify the network environment and filt reactionary text is a hot researchof the construction and monitoring of the Internet now.At present,the mainstream research method of filtering reactionary texts is basedon intelligent analysis filtering of text content,mainly including the vector space modelmethod,neural networks and semantic-based filtering.The main drawback of Neuralnetworks and semantic-based filtering is complicated algorithm and slow implementa-tion, and the draoback of vector space model mainly include a large amount of docu-ments related to calculation and the lack of semantic factors, based on the above short-comings,the research direction filtering reactionary texts mainly focused on improvingaccuracy and reducing the filting time-consuming.The main purpose of this paper is todesign a method of reactionary texts filtering,while maintaining accuracy under theconditions of the maximum reduction in filtration time-consuming,when the traininglibrary is not sufficient, it is also able to achieve high performance.This article contains mainly the following contents:1、 Using a new mothod of text weight calculation.The method takes full consid-eration of the frequency of the items to be indicated in the text and the ability of distin-guish between different types of properties,which representing the text better.2、 The methods that commonly used of filtering reactionary texts is time-consuming,in this case,propose a new method of filtering reactionary texts witch can reducethe time spent on filting,and can maintain the same accuracy even better accuracy at thesame time. 3、 In the environment of Internet,training text database is always inadequate.Inthis paper,we collected two types of training text library.In this case,prove that the me-thod proposed in this paper has better filtering performance than classical methods bothtypes of training text library.4、 Based on analysing the reason for non-balanced data set and the solution of thequestion that presented by others,has present two kinds of methods that solve it.Makeuse of text normalization at the same time,it proved that the method is effecitve accord-ing to experiments.5、 Designed and validated a new filtering methods method based on the differencebetween positive and negative characteristics of items,proposed the concept of thresholdlimit value and the method for determining it.make use of normalized andnon-normalized,and get the best threshold according to experiment.6、 By analyzing the above-mentioned experiment,Get the best strategy of newmethod of filting reactionary texts.Experimental results show the effectiveness of thestrategy.
Keywords/Search Tags:text weight, Filting Reactionary Text, text normalization, non-balan ced dataset, positive and negative item
PDF Full Text Request
Related items