Research Of Text Categorization Algorithm Based On Rough Set Theory

Posted on:2008-10-16

Degree:Master

Type:Thesis

Country:China

Candidate:H J Li

Full Text:PDF

GTID:2178360215979364

Subject:Computer application technology

Abstract/Summary:

With the rapid development of the Internet technology, information processing has become an indispensable tool for people to obtain useful information. Text categorization is an important research field, its target is to allocate one or more suitable classes to texts, based on analyzing the text contents. Now there are many methods that have been applied to this field, such as SVM, KNN, Naive Bayes, Decision Tree, etc. Compared with these methods, the method based on rough set has the following advantages. It does not need to supply any prior-probability information besides the data sets used for solving the problem. It includes a kind of format model, which gives knowledge obvious data meaning and can be analyzed and processed by mathematic method. It can obtain the minimum feature sets and can reduce the dimensions of feature vector, having no effect on text categorization accuracy. This method can get the simplest rules. For other methods, some cannot get obvious expressed rules, such as KNN and Naive Bayes. Some has much more redundant rules, such as Decision Tree.This thesis discusses the text categorization task using theory of rough set. Firstly, texts are pretreated including participle, statistical word frequency, managing stop-words etc. Then pick up characteristic words with TF-IDF function. Secondly, knowledge of classification is showed by decision table: characteristic words as attributes, weights as the values of attributes and classes of texts as the decision attributes. Thirdly, decision rules are produced through attributes reduction. Finally, we categorize test texts according to gained rules just in order to validate correctness.The experimental results indicate the effectiveness of the approach. It not only reduces the feature vector dimensions, but increases the precision and recall.

Keywords/Search Tags:

Text classification, Characteristic selection, Rough set, Attribute approximation, Decision rules

Related items

1	Research On Text Emotion Classification Based On Rough Set
2	The Research On Text Classification Technology Based On The Rough Set Theory
3	Research On Optimization Of Text Classification Based On Improved Rough Set Model
4	Research On Integrated Classification Algorithm Based On Rough Set Attribute Reduction
5	Study And Application Of Attribute Reduction Algorithms Based On Rough Sets
6	Research On Multi-objective Attribute Reduction Based On Decision Rough Set Model
7	Studies On Feature Selection Method Based On Heuristic Attribute Reduction Of Rough Set
8	The Study On Approaches Of Mining Classification Rules Based On Rough Sets Theory And Intelligent Computing
9	The Research Of Chinese Text Categorization Based On Rough Set In Spam Filtering
10	Attribute Reduction Based On Rough Set Theory And Research On Classification Algorithm Of Decision Tree