Based On Rough Set Text Automatic Classification Study

Posted on:2007-11-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2208360185464695

Subject:Computer application technology

Abstract/Summary:

With the rapid development of the World Wide Web, the network becomes the effective platform to exchange and process information. More and more information has been expressed as text. To effectively organize and analyze massive Web information resource and help users to promptly get knowledge and information, it is more important and significant to find an effective text classification algorithms for classifying and organizing large-scale documents in the web.Rough Set theory as a new computation tool was proposed by Z.Pawlak in 1982. It can effectively analyz and process the inaccurate, inconsistent and uncertain information without any prior information. Since it is introduced into machine learning, artifical intelligence, it has been applied in the fields of knowledge acquision, rule generation, decision analysis, pattern recognition, data mining successfully. This paper carries on in-depth research to text mining based on rough set theory. The main works are as follows:1 , Chinese phrase segmentation is the premises and difficulty that we analyze the Chinesetext. We design a new algorithm for Chinese phrase segmentation by tagging the lexicon with useful words and useless words on the base of predecessor method. It is considered to process the ambiguous words. Using this method, we can extract several synthetic features to stand for entire former information well. Thus reduce the dimension and time complexity;2, A new algorithm of term weighting is applicated in automated text categorization. The algorithm considers term distributation among and inside class;3, A reduction algorithm based on rough set is improved and then applicated to extract the rules of text categorization. Firstly a decision table is created, in which the weights of text characteristic terms is discretized as the rules' condition attributes. Then, the rules of text categorization are extracted by knowledge reduction of RS. The numbers of rules extracted are reduced. The accuracy and speed of the text categorization is improved.

Keywords/Search Tags:

Text categorization, Rough sets theory, Textual feature extraction, Word segmentation, Reduction algorithm, text clustering

Related items

1	Research Of Text Mining Based On Rough Set Theory
2	Research Of Attributes Reduction And Samples Reducding Algorithm Based On Neighborhood Rough Sets And Application In Text Categorization
3	Research Of Categorization Algorithm Based On Rough Sets Theory Attributes Reduction
4	The Research And Application Of Rough Set In Text Categorization System
5	Automatic Text Categorization Based On Rough Set Theory
6	Application Of Rough Set Theory In Chinese Text Categorization
7	Research On Text Categorization Based On Support Vector Machine
8	Research On Chinese Text Categorization Algorithms Based On Technology Text
9	Research And Implementation Of The Automatic Chinese Text Categorization
10	Text Categorization Based On Rough Set Theory