Font Size: a A A

Research And Design On BHO-Based Content Filtering System

Posted on:2014-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:R B LiFull Text:PDF
GTID:2248330398959559Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In today’s era, the Internet plays an increasingly important role in people’s lives, work and learning. However, the Internet While providing convenience for people, but also the political, moral, and the rule of law planted a lot of hidden dangers, reason, that is flooding the Internet with a variety of adverse information. This paper studies how to build a filter barrier between users and Internet information, so that a variety of adverse information is to intercept the user’s field of vision.The page is a carrier of important network information, this article will be chosen for the study centered on how to achieve a real-time online web content filtering system, based on the BHO technical development of a Chinese web content filter, and current research status and the problem has to do the following work.Web content filtering is a real-time analysis technology, higher requirements on the algorithm by several common keyword matching algorithm, the author found that most of these algorithms are oriented small character set text this large character sets for Chinese terms of the effect is not ideal, and most of these algorithms only adapt to the offline content filtering, when the real-time filtering of making web pages takes longer to affect the user experience.In this paper, the idea of the hash function organization in order to improve the speed of matching forbidden word, the contents of their storage address combine in order to speed up the retrieval speed and segmentation technology prefix matching algorithm and binary search algorithm to match forbidden word. At the same time, in order to accelerate the speed of matching web content filter first page structure analysis thus removing the impact of noise on the body text of the page.The text filtering interference by some camouflage vocabulary, this article uses the character of a use Encoding rules, by one scanning pretreatment method excluding interfering characters, effectively improved to improve the accuracy of the filter.The system uses two filtering mechanism, a filter for URL filtering, access to the URL of the page you can get from the browser BHO technology, and then matched with the black and white lists, improve efficiency, the URL is hashed;the secondary filter is the keyword combinations filtering through the algorithm forbidden word matching. This article of the program carried out a detailed design and implementation, and the overall framework of the system, the function module and filtering algorithms and specific strategies were described in detail. Finally, the test of the implemented system, experiments show that the system has good filtering performance and running speed.
Keywords/Search Tags:BHO, Content Filtering, URL Filtering
PDF Full Text Request
Related items