Font Size: a A A

Design And Implementation Of WEB Text Filtering System

Posted on:2010-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:F X ShenFull Text:PDF
GTID:2178360275459247Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,the amount of information increases in an explosive way.Text information filtering technology has made great progress and information filtering based on web text has become a research hotspot.The pre-topic of this paper is the research of IPCG gateway and the research of this paper is how to improve the gateway's supervision capability for the public services.By studying the real-time content filtering under the Linux and the relevant technology of text filtering,this paper proposes and implements a web text filtering system based on IPCG gateway.Firstly,this paper shows the overall framework of the system which combines real-time online filtering with offline filtering,and puts forward a distributed filtering system which refers the database synchronization algorithm of OSPF routing protocol.Real-time online filtering module includes two processes.One is the pretreatment of packets,and the other is the IP-based and the keyword-based filtering.The pretreatment of packets aims at getting correct data information by web content analysis and web structural analysis of web pages.The IP-based and the keyword-based filtering use the hash-tree structure to organize IP blacklist and the cache strategy to storage filtering content.The keyword-based filtering which combined with statistical information assigns the category to the page.Offline filtering model makes further offline analysis for the example and the unascertained page,and then updates the IP blacklist list and the keyword list used by online filtering module.This paper puts forward the feature extraction algorithm and the filtering strategy.The feature extraction algorithm considers the length of features,the structural information of pages and the semantic orientation information of features.The filtering strategy uses SVM at initial filtering stages and uses the improved adaptive template-based algorithm in latter stages.In order to update profile,it uses the improved coefficient adjustment strategy,and uses the feature attenuation factor.The experimental results show that the method proposed in this paper can ensure filtering process and data transfer independently,while it can improve both the speed and the accuracy of online filtering.
Keywords/Search Tags:Web Page Filtering, Online Filtering, Offline Filtering, Adaptive Information Filtering, Semantic Orientation
PDF Full Text Request
Related items