Text Categorization Of Chinee Small Sample Based On Graph Model

Posted on:2010-06-12

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Li

Full Text:PDF

GTID:2178330332487785

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid growth of the Internet, more and more information has been expressed as several kinds, where the text is the important one. Therefore text automatic categorization has been focused by home and foreign researchers. Now there are many effective methods that have been applied to this field, such as Naive Bayes, KNN, SVM, Neural Network, and Decision Tree, etc. When the huge texts are processing, because of the abundance documents existing for being trained, the performance of using SVM method is the best. Whereas, the lack of training texts results on the fall of the performance as SVM or KNN is used. The reason is that the numeric information chosen from documents is useful on vector space model, and the collection of words is ignored.At first this paper analyses the difference of Information Gain and Mutual Information based on traditional feature selection, then suggests a method based on space angle to reduce the number of feature words. Second, we analysis the shortage of vector space model, then we provide a series of method based on graph model, including graph model expressing, similarity comparing, and Graph-KNN and so on, to improve the accuracy of text categorization on small size samples. After these, we do many experiments using a mass of training and testing documents, and present the experiment result of Graph-KNN, compared it with the result of KNN. It show that this method improve the accuracy of text categorization on small size samples indeed. At last, we discuss the feasibility of the method applied on full-text search system.

Keywords/Search Tags:

text automatic categorization, space angle, feature selection, graph model, Graph-KNN

PDF Full Text Request

Related items

1	On Research For Chinese Automatic Text Categorization Technology Based On VSM Model And Feature Selection
2	Research On Key Technologies For Automatic Chinese Web Page Categorization
3	Research On Chinese Text Categorization Algorithms Based On Technology Text
4	The Research And Implementation Of Chinese Text Categorization System
5	Chinese Text Data Classification
6	Research Of Text Categorization Based On Vector Space Model
7	Design And Realization Of Text Categorization System
8	Study For Text Categorization Based On Feature Weighting
9	Research On Feature Selection Of Text Classification
10	Multi-class Scientific Literature Automatic Categorization System