Font Size: a A A

Automatic Text Multi-Classification Model Based On Class Latent Semantic

Posted on:2007-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:H YeFull Text:PDF
GTID:2178360185472664Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, the expansion of information is very rapid. It is inefficient to classify all these text information manually. Automated Text Categorization is a method that take advantage of the powerful capability of computer which can greatly improve the efficiency of the classification. There are many mature algorithms such as Rocchie, Naive Bayes, KNN, SVM.However, all the algorithms above are binary classification in natural, but there are so many multi-class classifition problems need to solve. In order to cope with multi-class classification, people combine several binary classifiers to create a multi-class classifier. There are three kinds of compose ways to producing the multi-class classifier, namly One-Against-Rest and one-against-one. Unfortunately, it cause numeral binary classifiers to be trained. A K classes classification problem will produce K binary classifiers by one-against-rest, (k 2 ) binary classifers for one-against-one. The multi-class classifier give the final result according to all the result of binary classifiers.To deal with the complexity of multi-class classification, we proposed a new automatic text classification model based on class latent semantic. It generate a matrix to present the texts' classes label information, and take latent semantic indexing, then extract the latent semantic pairs between term and class by Partial Least-square analysis. All those latent semantic pairs are used to multi-class classification. The new model shows better stability and precision in our experiment. It also slightly better in performance compared with to the state-of-art algorithms such as KNN and SVM.The main creatives of this paper are as follows:1. Use the class information of training set to build the model,and extract the feature benefit to classification.2. Proposed a new multi-class classification based on latent sematic model MPLC.
Keywords/Search Tags:Multi-classification, Latent Semantic Index, Latent Sematic Classification, Partial Least Square
PDF Full Text Request
Related items