The Research Of Text Categorization Based On Rough Set

Posted on:2006-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:J L Lu

Full Text:PDF

GTID:2168360155956973

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology, information processing has become an indispensable tool for people to obtain useful information. Text categorization is an important research field, whose target is to allocate one or more suitable classes to texts, based on analyzing the text contents. Now there are many methods that has been applied to this field, such as SVM, KNN, Naive Bayes, Decision Tree, etc. compared with these methods, the method based on rough set has the following advantages: doesn't need to supply any prior-probability information besides the data sets used for solving the problem; includes a kind of formal model, which gives knowledge obvious data meaning and can be analyzed and processed by mathematic method; can obtain the minimum feature sets; can reduce the dimensions of feature vector, having no effect on text categorization accuracy; can get the simplest rules. For other methods, some can't get obvious expressed rules, such as KNN and Naive Bayes, some have too much redundant rules, such as Decision Tree.This paper fulfilled the text categorization task using the perfect reduction theory of rough set. It mainly finished the following several jobs:I. Pretreated the documents, including words segmentation, part-of-speech tagging, frequencies statistics, position marking;II. Employed double comparing method to extract features, which is widely used in policy-making area. Double comparing method simplified the feature extraction algorithm and increased its precision. It is also an innovation of this paper;III. Took into account the influences of position and inverse document frequency in same and different classes, improved the Okapi term weighting formula and separated the term weights;IV. According to the boundary between categories, we use attributes reduction and relative reduction to reduce the dimensions of feature vectors, which is the key task of this paper;...

Keywords/Search Tags:

Text Categorization, Double Comparing, Rough Set, Attributes Reduction, Relative Reduction

PDF Full Text Request

Related items

1	Research Of Categorization Algorithm Based On Rough Sets Theory Attributes Reduction
2	Research Of Attributes Reduction And Samples Reducding Algorithm Based On Neighborhood Rough Sets And Application In Text Categorization
3	Research And Application Of Categorization Algorithms Based On Rough Sets Attributes Reduction
4	Study Of Web Text Mining Based On Rough Set Theory
5	The Research Of Text Categorization With Rough Set Based On Extracting Double Features And Heuristic Algorithm Reduction
6	The Research And Application Of Rough Set In Text Categorization System
7	Study On Attributes Reduction Of Gene Signals Based On The Rough Set Theory
8	Study On Data Reduction Based On Rough Set And Its Application In Modern Remote Education
9	Study Of Attributes Reduction In Rough Sets Theory
10	Rough Set Attributes Reduction And Its Several Applications In Power System