Font Size: a A A

The Research And Implementation Of Chinese Text Categorization System

Posted on:2007-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:L G GanFull Text:PDF
GTID:2178360185962629Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Information technology and the prevalence of Internet, the amount of web page increase explosively. Because the content of web page is mostly text, how to categorize web page automatically by its text information became an important research subject. Text categorization, the automated assigning of natural language texts to predefined categories based on their contents, is an important part of Information retrieval. This paper firstly introduce the research status of text categorization, secondly we study and discuss the key technique of text categorization, including Information retrieval model, Chinese word segment, Feature Selection, Feature Weight and Classify Methods. Considering the disadvantage of tradition Feature Weight, we use sentence's importance to compute feature's weight and experiment prove that this method is good for Categorization. Thirdly, we introduce the frame, system flaw and function module of Chinese text categorization system based on vector space model. Finally, we list the result of experiment on feature selection, feature weight and classify...
Keywords/Search Tags:text categorization, vector space model, Feature Selection, Feature Weight
PDF Full Text Request
Related items