Font Size: a A A

Research And Development Of Webpage Text Keywords Filtering

Posted on:2013-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:R M JinFull Text:PDF
GTID:2248330374469991Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At the present stage, information technology has been highly developed, the Internet has become one of the important medium in the information accessing and instant communication. It has brought great convenience to the people’s working and living. However, because of its global, open, real-time characteristics,, the Internet has become an important way for the lawless people to spread bad information. It is more difficult than traditional media to manage. How effectively manage the Internet, filtering the information, and purify the Internet environment became an most problem.In the Internet, Seventy percent of the network information is based on the textform, webpage text filtering has become the most important means of network monitoring. There are many webpage text filtering method.Keyword filtering is the most widely used and most reliable technology, although there are some limitations, but due to its filtering speed is fast, easy to implement to get a great application. This paper is for the limitations of webpage keyword filtering, proposed a reasonable solution, and designed the system.This paper briefly introduces the background of the webpage text filtering system. Systematically analysis the key concerns of various steps in the process, including the extension of the concept, analysis of the characteristics of sensitive words and feature extraction, and propose appropriate solutions. The article draws the advantage from the other text filtering method. Take full account of the recall, precision, operation and achieve an improved web page text filtering system architecture, and elaborate a kind of multi-level web page text keyword filtering methods, and elaborate the main module of the system’s mathematical model and related algorithms.This system consists of two parts:a packet capture and restructuring subsystem and webpage text keyword filtering subsystem. Packet capture and reorganization subsystem is used to capture web data packets through the network card in the LAN, and the reorganization of the captured packets, and will restore it into a complete web page. Text keyword filtering subsystem design, detailed analysis main hidden features of sensitive words in the web page. Improve the sensitive word recognition method based on the dictionary, so that sensitive words to restore the original state of combination. Improve keyword weights algorithm. Improve a better solution to the weights of the different locations and short documents in Web pages Weights, based on the experimental results, obtain better filtering effect.
Keywords/Search Tags:information filter, keyword matching, sensitive information, information security
PDF Full Text Request
Related items