Studies On Granular Computing-based Of Text Classification Technology

Posted on:2012-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:X Q Zhang

Full Text:PDF

GTID:2248330374980811

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid expansion of network information and the continuing emergence ofa large number of electronic texts, how to organize and manage these vast amounts ofinformation becomes a major challenge that people have to face. Automatic textclassification technology is about how to let the machine classify the unknown textthrough independent learning, so as to address the difficulties encountered in manual.Current text classification focuses on feature reduction and classifier design. Sincegranular computing can reduce the dimension of knowledge when it solvescomplicated issues, making it easier to summarize and access the knowledge, it hasbecome the popular research field in recent years, and also provides a new way fortext categorization research. Compared with SVM and KNN, the rough set model ofgranular computing can obtain knowledge by digging decision rules. As its decisionprocess is transparent and easy to be understood, it has been paid much attention andalready applied in text classification. On the basis of the existing achievements ofgranular computing, this paper tries further research on the application of textclassification, and completes the following job:(1) After analyzing the existing feature selection methods, based on therelationship between words and categories, this paper proposes the feature distributiondistance. By calculating the distribution distance between any two features, and thenclustering the features whose distribution distance is close, the dimensions of featurespace can be effectively reduced. At the same time, it also avoids the phenomenonthat several samples are discarded as they do not contain the selected features selectedby the existing feature selection methods. Experiment results show this cluster methodcan obtain high accuracy compared to other feature selection methods when it usesSVM as classifier.(2) Based on the principle of granulate, this paper proposes a way that divides thetraining set into different information granularities to reduce the complexity ofproblem analysis. Also according to the relevant principle of rough set, this paperselects the features of each information granular, uses these features as condition property to build synergistic matrix, and obtains the feature reduction set by lookingfor the most similar samples through the heuristic search method.(3) By analyzing whether there are the same condition properties betweendifferent information granular, this paper calculates the purity of granular, whichprovides the poll evidence for inconsistent rules.The experiment shows that the study has obtained some achievements on featurereduction, as well as applying the relevant principles of granular computing to textclassification. The rules obtained by attribute reduction can be easily understood, andcan get high accuracy in classification.

Keywords/Search Tags:

Granular Computing, Feature Distribution Distance, Rough Set, Information Granular

PDF Full Text Request

Related items

1	The Research Of Rough Set Theory And Granular Computing Crossing Problems
2	Granular Space And Granular Computing Of Information Systems Based On Binary Relation
3	Tolerance Granular Space And Its Applications
4	Research On Discretization Of Attributes Based On Granular Computing And Rough Set
5	Research On Granular Rough Theory And Key Issues In Computational Web Intelligence
6	Research On Theory Of Granular Computing And Its Application On Image Retrieval
7	Rough Decision Rules Reduction Based On Granular Computing
8	Research Of Granular Computing And Extension Of Variable Precision Rough Set Theory Based On Pansystems Theory
9	Research Of Rough Set And Granular Computing Theory Application In Air Tickets Recommender System
10	Study Of Data Mining Based On Rough Set And Granular Computing