The Research And Application Of Rough Set In Text Categorization System

Posted on:2008-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:S M Yang

Full Text:PDF

GTID:2178360215972132

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology and communication technology, people can acquire more and more text information. It is a great challenge for information science and technology that how to organize and process large amount of document information, and fine the interested information of user quickly, exactly, and fully. As key technology in organizing and processing large mount of document data, text categorization can allocate one or more suitable classes to texts based on analyzing the text contents. Moreover, text categorization has the broad applied future as the technical basis of information filtering, search engine, text database, and digital library and so on.This paper mainly researches the system of text categorization based on the Rough sets theory systematically and deeply. The research results are descried as follows in detail:The theory of Rough sets, presented in 1982 by Polish mathematician Pawlak Z, is a powerful mathematical tool for analyzing uncertain, fuzzy knowledge. Rough sets, as a new hotspot in the field of artificial intelligence, can effectively deal with the expression and deduction of incomplete, uncertain knowledge. It need not supply any prior-probability information besides the data sets used for solving the problem; includes a kind of formal model, which can be analyzed and processed by mathematic method; can obtain the minimum feature sets .can reduce the dimensions of feature vector, having no effect on text categorization accuracy; can get the simplest rules. For other rules, some can't get obvious expressed rules, such as Na?ve Bayes and KNN.(1)This paper particularly introduces Rough sets and its correlation theory and means, the basic content of the text categorization; analysis their researching backgrounds and actuality; discusses the future developmental trends, hot research fields. All of the above become the basis for the paper.(2)On the basis of commonly relatively reduction algorithms of Rough sets and Tabu algorithm, an improved attribute reduction algorithm is presented by researching the advantages and disadvantages of existing attribute reduction algorithms. The attribute importance is heuristic information of the improved algorithm which can get a least reduction.(3)In order to shield word segmentation for text, this paper presents a text expression method and an algorithm of getting key words. This algorithm overcomes the GF/GL method that was presented by ZhangXueying problem which can't be solved when the same characters'appearing frequency is 1 in especially literature.This paper presents a text categorization system model based Rough sets mainly including text pretreatment model, attribute reduction model and matching rules mode. It mainly deeply researches attribute reduction and matching rules. Finally, utilizing imitating test, the text categorization is feasible based Rough sets theory.The drawback of this paper is two abstracts which is the limit of word database,suspend term table and the calculating of knowledge gram which is just deal with research moment and doesn't form unit knowledge structure. Knowledge gram importance, as a heuristic information, that is used in attribute reduction and text expression is very little. And it is puzzle for me to research the soft calculation. It has still many problems to be worthily discussed for us. The algorithm of this paper is feasible, related algorithm and imitating test system need to be future developed.

Keywords/Search Tags:

Rough sets, knowledge granularity, importance, text categorization, attribution reduction, matching rules

PDF Full Text Request

Related items

1	Research Of Attributes Reduction And Samples Reducding Algorithm Based On Neighborhood Rough Sets And Application In Text Categorization
2	Research Of Categorization Algorithm Based On Rough Sets Theory Attributes Reduction
3	Research On Approaches Of Dynamic Attribute Reduction Based On Knowledge Granularity
4	Research On Mixed Data Knowledge Acquisition Method Based On Neighborhood Multi-granularity Rough Sets
5	Based On Rough Set Text Automatic Classification Study
6	Research On Attribute Reduction And Concept Drift With F-rough Sets
7	Research Of Hierarchical Text Categorization System Based On VSM And Rule Matching
8	Research On Knowledge Roughness Based On Rough Sets
9	Research On Knowledge Acquisition And Reduction In Rough Set Theory
10	Research On Knowledge Discovery Based On Rough Sets