Font Size: a A A

Studies On Granular Computing-based Of Text Classification Technology

Posted on:2012-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ZhangFull Text:PDF
GTID:2248330374980811Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid expansion of network information and the continuing emergence ofa large number of electronic texts, how to organize and manage these vast amounts ofinformation becomes a major challenge that people have to face. Automatic textclassification technology is about how to let the machine classify the unknown textthrough independent learning, so as to address the difficulties encountered in manual.Current text classification focuses on feature reduction and classifier design. Sincegranular computing can reduce the dimension of knowledge when it solvescomplicated issues, making it easier to summarize and access the knowledge, it hasbecome the popular research field in recent years, and also provides a new way fortext categorization research. Compared with SVM and KNN, the rough set model ofgranular computing can obtain knowledge by digging decision rules. As its decisionprocess is transparent and easy to be understood, it has been paid much attention andalready applied in text classification. On the basis of the existing achievements ofgranular computing, this paper tries further research on the application of textclassification, and completes the following job:(1) After analyzing the existing feature selection methods, based on therelationship between words and categories, this paper proposes the feature distributiondistance. By calculating the distribution distance between any two features, and thenclustering the features whose distribution distance is close, the dimensions of featurespace can be effectively reduced. At the same time, it also avoids the phenomenonthat several samples are discarded as they do not contain the selected features selectedby the existing feature selection methods. Experiment results show this cluster methodcan obtain high accuracy compared to other feature selection methods when it usesSVM as classifier.(2) Based on the principle of granulate, this paper proposes a way that divides thetraining set into different information granularities to reduce the complexity ofproblem analysis. Also according to the relevant principle of rough set, this paperselects the features of each information granular, uses these features as condition property to build synergistic matrix, and obtains the feature reduction set by lookingfor the most similar samples through the heuristic search method.(3) By analyzing whether there are the same condition properties betweendifferent information granular, this paper calculates the purity of granular, whichprovides the poll evidence for inconsistent rules.The experiment shows that the study has obtained some achievements on featurereduction, as well as applying the relevant principles of granular computing to textclassification. The rules obtained by attribute reduction can be easily understood, andcan get high accuracy in classification.
Keywords/Search Tags:Granular Computing, Feature Distribution Distance, Rough Set, Information Granular
PDF Full Text Request
Related items