Font Size: a A A

Study On Information Filtering Method Based On The Text Categorization

Posted on:2009-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:H B NiuFull Text:PDF
GTID:2178360245986375Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The growth of Internet brings us convenience and meanwhile brings some problems including information overload, information lost, information porn and information violence. To overcome these problems, the research of information filtering has drawn much attention. Chinese text filtering is a branch of Chinese information processing research. It searches the useful information and eliminates the useless or irrelevant information in the dynamic data stream according to users'request. But the traditional filtering technology, such as based on keywords or based on IP address filtration cann't effectively to solve these problems now. So the paper carried on research to the analysis of the information filtering based on text categorization technology in order to security filter the information of network.The subject applied text categorization to information filtering domain, it proposed a kind of filtering method based on text categorization technology. At first it takes pretreatment to text of internet. The improved method made the features that represent the text turn into the pure Chinese term. Secondly, the paper drawn support from the thought of the vector space model that set up the vector space form of the text. Then the attributive character of the word was introduced to the vector space form to analyze the whole characteristic of the text. The system according to users'filtering demand, sets up the information characteristic filtering model. Then it judged test text whether meet users'filtering demand according to matched proceeding of test text and information feature. Contemporarily, because the filtering technology of statistics characteristic neglects the semantic restrain of the text, it can't really analyze the text intelligently. The paper introduced local semantic analysis to analyze text from both integrate feature and local feature, considered two respects factors that statistics characteristic of the file and knowledge synthetically in order to analyze and filter text efficiency. Through preliminary test attain success of security filter at particular information.Experimental result indicates that method put forward in the thesis can identify the sensitive information and achieve safe filter to the text. But it is a complicated and long course to filter text intelligently, the method of the paper is only a beginning step. So it should be studied further in future work.
Keywords/Search Tags:information filter, text categorization, filtering model, vector space model
PDF Full Text Request
Related items