| With the rapid development of information network, Internet has become the biggest resource of the richest information. People share the information it provides and feel convenient, but the "irrelevant information" and "information rubbish" puzzles them at the same time, moreover,many "harmful information" is threatening the health of the majority of minors.Many countries and regions have realized the seriousness of this issue, and take measures to filter network information. Filtering is one method to help users to obtain the information that mostly fits their needs.The function of information filtering is to select the relevant information or eliminate the irrelevant information from dynamic information flow on the Internet according to certain criteria approaches.This paper carried out the following research on how to implement a real-time online web text filtering system. First, this article describes the basic network information filtering, including the basic principles of information filtering, information retrieval model and performance evaluation of filtration systems. Secondly, the text focuses on the key techniques in the web text filtering, including how to extract page text, how to segment Chinese words,and how to extract suitable features from documents, etc.On this basis, this paper proposes a new technology solution, that is a web text tertiary filtering system based on a Browser Helper Object(BHO). The first level is a URL filtering, the second level is keyword filtering, and the third filter is web content filtering. Then, this paper carries out the design and implementation, and describes the overall framework of the system, functional modules and the relevant filtering algorithms. Finally, we get the test result.The results show that the system has good filtering performance and speed. |