Study On Chinese Text Classification Algorithm Based On Rough Set And It's Application

Posted on:2011-11-24

Degree:Master

Type:Thesis

Country:China

Candidate:B F Zhang

Full Text:PDF

GTID:2178360302993792

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the network technology, the amount of information on the network is dramatically increasing. It is how to effectively organize and manage these online documents that has become an urgent problem to be solved. Text classification has become a key solution to the problem. And it is an important branch of text mining, which has get more deeply researched because of its unique knowledge discovery function. Text classification has been in a wide range of application fields, such as information filtering, retrieval, digital library services and so on, and it,has broad application prospects.Rough set theory can deal with fuzzy and uncertain knowledge. it can effectively analyze and deal with incomplete, inconsistent, inaccurate data, without any prior information. Thus knowledge can be analyzed and dealt with using a mathematical method, and implicit knowledge can be discovered, and, potential rules can be revealed. The main idea of rough set theory is to lower the dimension of feature vectors without affecting the classification accuracy, and to obtain the simplest classification rules.This paper mainly researches the system of text classification based on the Rough sets theory systematically and deeply, and the algorithm were applied to the classification system in the Public Security Intelligence System. The main work of the paper is as follows:(1) This paper describes the relevant text classification techniques, and some commonly used text classification algorithm is a detailed analysis and comparison.(2) Aiming at the problem that document set is dealt with as a whole and distribution of features among and in classes is not taken into account when using traditional TFIDF method, an improved TFIDF method which is combined with information entropy is proposed. This method modifies the method of calculating weights of features of TFIDF by combining itself with information entropy of features among and in classes, which overcomes the defect that the features that made less contribution to the categorization are given greater weight, and calculates weights of text features more efficiently.(3)Aiming at the problem that the traditional feature selection method which filters features using frequency threshold would result in information loss and reduce the classification precision, a novel automatic text categorization method based on rough set is proposed. In the proposed method, the weighted attribute features discretization is carried out to form a decision table; then, selection of conditional attributes at the decision table is carried out on the basis of attribute significance which is based on dependency degree; finally, the reduction of text attribute features is performed by heuristic algorithm which is based on conditional information entropy.(4) The improved TFIDF method and a novel automatic text categorization method based on rough set which were proposed in this paper were applied to the public security intelligence classification subsystem. Practical application shows that the use of the system can obtain better results for text classification.

Keywords/Search Tags:

text classification, TFIDF, vector space model, Rough sets, attribute reduction

PDF Full Text Request

Related items

1	Research Of Attributes Reduction And Samples Reducding Algorithm Based On Neighborhood Rough Sets And Application In Text Categorization
2	Matroidal And Topological Approaches To Rough Sets
3	Based On The Chinese Text Of The Rough Set And Neural Network Classification
4	An Attribute Reduction Algorithm Based On Dynamic Neighborhood Rough Set For Text Classification
5	Rough Sets Theory And SVMs Based Multi-class Classification Algorithm
6	Text Classification Model Based On Fuzzy-Rough Sets Theory
7	Research On Accelerated Algorithm Of Attribute Reduction In Rough Sets And Its Neighborhood Model
8	Research On Attribute And Attribute Value Reduction Method Based On Rough Sets
9	Attribute Reduction Algorithm Of Neighborhood Rough Sets And Its Application In Classifier
10	Study On Attribute Redution Based On Rough Sets And Its Application