Study On Methods Of Data Mining And Text Mining Based On Rough Set

Posted on:2006-12-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M C Wang

Full Text:PDF

GTID:1118360182975486

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Recently data mining and text mining are important research areas ininformation technology. Applying rough set theory, one of soft computingtechnologies, to data mining and text mining has a great theory significance andpractice value. Methods of data mining and text mining have been researched in thispaper, which mainly includes: attribute reduction methods, clustering methods, a textclassification rule extraction method and a data mining method combined rough settheory and fuzzy set theory fully. The mainly works are shown as follows:A text fuzzy clustering algorithm which combines rough set and geneticalgorithm fully is presented. In the clustering process, the weights parameters are alsodescribed by genetic algorithm, which makes parameters more scientific andoperational.The definition of proximate rule is proposed and the meaning of Ï‡~2 value isdiscussed. Then a text classification rule extraction method which combines Ï‡~2value feature selection and rough set theory fully is proposed. The method improvesthe effectiveness and the practicability of extracting text rule greatly.The definition of membership function mentioned in the relative literature isimproved, and the transforming rules from the quantitative decision table to thequalitative decision table are proposed. The rules can change an n-dimensionalquantitative decision table into an n-dimensional qualitative decision table instead ofa 3n-dimentional one. So it greatly decreases the following computing complexity ofrule extraction using rough set theory, and increases the quality of extracted rules.A new text dimension reduction method by using the theory of PatternAggregation and Latent Semantic Indexing is presented. The method firstly reducestext dimension with Pattern Aggregation theory that uses class label, then makes thetext dimension further lower by LSI method.An improved algorithm of attribute reduction based on rough set and Tabusearch is developed. The effectiveness of the algorithm is demonstrated byexperiments.A rough set clustering method based on knowledge simplicity degree ispresented. With introducing the indiscernibility degree and the knowledge simplicitydegree, the new clustering method makes the clustering result more scientific andreasonable.The RPCL method is applied to text clustering, which can determine the numberof clustering automatically and has good effectiveness.

Keywords/Search Tags:

rough set, data mining, text mining attribute reduction, clustering, categorization

PDF Full Text Request

Related items

1	Application Of Rough Set Theory In Chinese Text Categorization
2	Association Rule Mining Algorithm Based On Rough Set
3	Data Mining Research Of Vehicle Sales Based On Hash Quick Attribute Reduction Algorithm
4	Rough Set Data Mining Approach And Its Application Relative To Decision Problem
5	Research On Application Of Rough Set Theory In Data Mining
6	Based On Rough Set Attribute Reduction Algorithm Of Data Mining To Improve Research
7	Research On The Attribute Reduction Algorithm Based On Rough Set In Data Mining
8	The Study On Model Of Data Mining And Attribute Reduction Algorithm Based On The Rough Set Theory
9	Research On Attribute Reduction Algorithms For Data Mining Based On Rough Set
10	The Research Of Clustering Based On Rough Set Theory