Chinese Web Text Filtering Technology Research

Posted on:2011-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:L J Wang

Full Text:PDF

GTID:2178360308981408

Subject:Computer application technology

Abstract/Summary:

with the increasing popularity of internet, people are increasingly dependent on the network, equality, openness, unbounded networks, etc of the internet. has led to unlimited abuse, a lot of rubbish and sensitive information overload on the network, especially for majority of young students, a number of "harmful information" that is threatening their physical and mental health. how to help users more convenient and effective use of available network resources, and to get useful information is a research direction of information processing.the current web filtering system is mainly used URL filtering and keyword filtering technology, but these technologies in the web filtering are deficiency both the accuracy and speed. web filtering to improve the accuracy and speed must be in-depth analysis of web content. web page is a structured document, DOM is an HTML and XML documents for the flexible operation of the programming interface. in a detailed analysis of the structure of web pages, this paper put forward the resolution in accordance with the structure of the web pages,using DOM extraction page of the text content in different elements of the document. this first elaborate the basic information filtering on the web, including the basic principles of information filtering, filtering system in general processing, classification and performance evaluation indicators of filtering system.then, focusing on web content filtering in-depth analysis and discussion to the key technologies involved in the text,mainly include chinese word segmentation techniques, text feature extraction techniques, user interest model representation and updating as well as text filtering technology. contrary to the current low status of extraction in the web information extraction technology, in this paper, proposed based on the HTML tree and content analysis of adaptable information extraction. contrary to vector space model for the filter structure of regardless on the page weight,makes the reasons for the low filtration,improved vector space model representation of the text vector,experimental results show the improvement of vector space model is more suitable for web page text filters. based on the research, designed a prototype system to chinese web filtering, and detail the overall framework of the system,functional modules,as well as the main method of system implementation, finally, the system was tested,experiments show that the system has good performance of information filtering.

Keywords/Search Tags:

information extraction, web filtering, DOM tree, vector space model, information filtering

Related items

1	Study And Application Of Chinese Information Filtering System
2	Information Filtering Systems Based On Web Text Content And Design,
3	The Research And Implementation Of The Chinese Text Information Filtering Based On Web Content
4	Web Page Information Filtering Method Research Based On Vector Space Model
5	Research, Key Technology For Information Filtering Based On Vector Space
6	Researches On Multimedia Information Filtering Technologies In The Internet
7	Application Research Of Information Filtering Technique In Education Network
8	Research On Technique Of Information Filtering Based On CoP Modeling
9	Based On The Research Of Web Bad Information Filtering System
10	Web Page Information Filtering Method Research Based On Vector Space Model