The Research And Implementation Of Chinese Text Categorization System

Posted on:2007-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:L G Gan

Full Text:PDF

GTID:2178360185962629

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of Information technology and the prevalence of Internet, the amount of web page increase explosively. Because the content of web page is mostly text, how to categorize web page automatically by its text information became an important research subject. Text categorization, the automated assigning of natural language texts to predefined categories based on their contents, is an important part of Information retrieval. This paper firstly introduce the research status of text categorization, secondly we study and discuss the key technique of text categorization, including Information retrieval model, Chinese word segment, Feature Selection, Feature Weight and Classify Methods. Considering the disadvantage of tradition Feature Weight, we use sentence's importance to compute feature's weight and experiment prove that this method is good for Categorization. Thirdly, we introduce the frame, system flaw and function module of Chinese text categorization system based on vector space model. Finally, we list the result of experiment on feature selection, feature weight and classify...

Keywords/Search Tags:

text categorization, vector space model, Feature Selection, Feature Weight

PDF Full Text Request

Related items

1	Research On Chinese Text Categorization Algorithms Based On Technology Text
2	Design And Realization Of Text Categorization System
3	Normal Weight Based Feature Selection Method In SVM Text Categorization
4	Research Of Text Categorization Based On Vector Space Model
5	On Research For Chinese Automatic Text Categorization Technology Based On VSM Model And Feature Selection
6	Research On Feature Selection Of Text Classification
7	Study For Text Categorization Based On Feature Weighting
8	Research On Classification Module Of Core Competency Assessment System
9	Research On Feature Vector Optimization Techniques In Web Text Classification
10	Research Of Text Categorization Base On Vector Space Model And Association Rules