Font Size: a A A

Research And Design Of Web Information Filtering System

Posted on:2010-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2178360275959239Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the Internet constantly into every aspect of people's lives,how to filter the illegal web page for creating a good environment for people is becoming an important field.Because of the disadvantage of the current web page filtering system,this paper constructs a dynamic corpus,a set of illegal words and a good way to express vector express web page,and on this basis it designs a multi-level,multi-strategy filtering system for filtering web pages.Classification performance of classification algorithm is related to the training corpus. Classification algorithm can get a good performance in good or high quality training corpus.Therefore,on this basis of cutting samples based on density-based KNN classifier this paper proposes an algorithm of adding samples based on density-based KNN classifier.By that algorithm we get a dynamic,density uniformity and big coverage training corpus.A good collection of illegal keywords should be able to reflect the characteristics of illegal website on the current internet;therefore,this paper proposes a new collection of illegal keyword extraction algorithm by combining the traditional illegal keyword extraction algorithm with OCAT RA1 algorithm.The algorithm can automatically obtain an appropriate size of the illegal collection of keywords,and by use of the collection the keyword filtering algorithm can get a better filtering result.In order to obtain better classification results of web pages' filtration,this paper presents a new method of expressing the web page according to web page information and web page structure.By the use of the method,we get a better filtration for web pages.Finally,by the characteristics of URL filtering,keyword filtering,text classification filtering,it builds a multi-level,multi-strategy web filtering system.Experimental results show that the system get a high recall and precision rate,and meet the requirements of real-time.
Keywords/Search Tags:Corpus, OCAT RA1, Text Filtration, Web Filtering
PDF Full Text Request
Related items