Font Size: a A A

The Research Of Text-Classification Based On Rough Set Theory

Posted on:2007-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2178360212974002Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The paper takes text categorization in field of information retrieve as object to be studied. There are two key points one is the methods of text categorization the other is that how to put theory of rough set into applications in auto classifications.Theory of rough set is a new mathematical tool. Because it is good at dealing with incomplete information systems it is used in many fields successfully. But the application in field of information retrieve is relatively little and research using rough set likewise. So in this paper the attention is mainly focused on this topic through analyzing actuality, difficulties and opportunities. The basic point of view of this paper is that rules extracted from information system are uncertainty. So study and research in this paper will not be proceeding in this way.In this paper we construct a new model of training that take pre-index training class standards and the keywords emerged as both approximated and precised knowledge and draw some conclusions about the relationship between training class and key features in model of rough set. In this paper, we develop a new model of single text categorization based on theory of rough set. The model use rough set in the way of the knowledge approximation and corresponding computation. This model also solves the problem of categorization measurement based on the standpoints that "take classic rough set as process of training while fussy set as test result". By using the conclusion mentioned above about relations between features and classes to design a classification algorithm. And take the concept of Rough Precise into computation. The algorithm can be used in multi-classification. We also give algorithm named double orientation approximation classification algorithm to deal with problems of degenerating at same time to solve the problems of feature reduction.In this paper, we also develop a new model of multi-text categorization based on theory of tolerance rough set. The model use rough set in the way of the approximation semantic extension and proper relation definition. It take upper approximation as latent semantic extension while lower approximation as core of concept.
Keywords/Search Tags:Rough Set, text categorization model, tolerance relation, fuzzy set, double orientation approximation
PDF Full Text Request
Related items