Font Size: a A A

Text Categorization Using Graph Model Based On Clustering Analysis

Posted on:2012-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:X R LiuFull Text:PDF
GTID:2178330338992279Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of text message, information processing has become an indispensable tool for knowledge acquisition. Text classification is an important research direction of information processing, which effectively improves the quality of information services, making it easier for users to locate more accurately the required information. Also text classification is widely used in text processing and information retrieval fields. Text classification is a complex process, including document preprocessing, feature selection, text representation, classification algorithm design, performance evaluation and other major steps.In this paper, the key techniques, involved in text classification, are researched and discussed in depth. Currently, some relatively mature text classification algorithms have been applied to text classification, but they are mostly based on the vector space model, that the dimension of the vector space used to represent text is quite large, up to ten thousands of dimensions. Firstly, throughχ2 statistic to screen the features initially, and then according to the distribution of feature item, this paper proposed a feature clustering algorithm, that feature items with same distribution are constituted to clustered concept, based onχ2 statistic, reduce feature dimension effectively and solve the conflict between high dimensional feature space and sparse document vector. Aim at the problem of traditional vector space deal with isolated points, this paper uses graph model to build relationships among words, and solves the problem between the extraction of associated feature dimensions and high dimensional vector space to some extent. Finally, based on full consideration of feature reduction and disambiguation, uses KNN method to classify the text based on feature clustering and graph model.This algorithm improves the classification contribution of rare words, and enhances the classification performance of associated words, and reduces dimensions of text vector. The algorithm improves the classification precision and recall rates.
Keywords/Search Tags:Feature Clustering, Graph Model, Text Classification
PDF Full Text Request
Related items