Font Size: a A A

Text Categorization Of Chinee Small Sample Based On Graph Model

Posted on:2010-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2178330332487785Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid growth of the Internet, more and more information has been expressed as several kinds, where the text is the important one. Therefore text automatic categorization has been focused by home and foreign researchers. Now there are many effective methods that have been applied to this field, such as Naive Bayes, KNN, SVM, Neural Network, and Decision Tree, etc. When the huge texts are processing, because of the abundance documents existing for being trained, the performance of using SVM method is the best. Whereas, the lack of training texts results on the fall of the performance as SVM or KNN is used. The reason is that the numeric information chosen from documents is useful on vector space model, and the collection of words is ignored.At first this paper analyses the difference of Information Gain and Mutual Information based on traditional feature selection, then suggests a method based on space angle to reduce the number of feature words. Second, we analysis the shortage of vector space model, then we provide a series of method based on graph model, including graph model expressing, similarity comparing, and Graph-KNN and so on, to improve the accuracy of text categorization on small size samples. After these, we do many experiments using a mass of training and testing documents, and present the experiment result of Graph-KNN, compared it with the result of KNN. It show that this method improve the accuracy of text categorization on small size samples indeed. At last, we discuss the feasibility of the method applied on full-text search system.
Keywords/Search Tags:text automatic categorization, space angle, feature selection, graph model, Graph-KNN
PDF Full Text Request
Related items