Font Size: a A A

Study On Methods Of Data Mining And Text Mining Based On Rough Set

Posted on:2006-12-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:M C WangFull Text:PDF
GTID:1118360182975486Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Recently data mining and text mining are important research areas ininformation technology. Applying rough set theory, one of soft computingtechnologies, to data mining and text mining has a great theory significance andpractice value. Methods of data mining and text mining have been researched in thispaper, which mainly includes: attribute reduction methods, clustering methods, a textclassification rule extraction method and a data mining method combined rough settheory and fuzzy set theory fully. The mainly works are shown as follows:A text fuzzy clustering algorithm which combines rough set and geneticalgorithm fully is presented. In the clustering process, the weights parameters are alsodescribed by genetic algorithm, which makes parameters more scientific andoperational.The definition of proximate rule is proposed and the meaning of χ~2 value isdiscussed. Then a text classification rule extraction method which combines χ~2value feature selection and rough set theory fully is proposed. The method improvesthe effectiveness and the practicability of extracting text rule greatly.The definition of membership function mentioned in the relative literature isimproved, and the transforming rules from the quantitative decision table to thequalitative decision table are proposed. The rules can change an n-dimensionalquantitative decision table into an n-dimensional qualitative decision table instead ofa 3n-dimentional one. So it greatly decreases the following computing complexity ofrule extraction using rough set theory, and increases the quality of extracted rules.A new text dimension reduction method by using the theory of PatternAggregation and Latent Semantic Indexing is presented. The method firstly reducestext dimension with Pattern Aggregation theory that uses class label, then makes thetext dimension further lower by LSI method.An improved algorithm of attribute reduction based on rough set and Tabusearch is developed. The effectiveness of the algorithm is demonstrated byexperiments.A rough set clustering method based on knowledge simplicity degree ispresented. With introducing the indiscernibility degree and the knowledge simplicitydegree, the new clustering method makes the clustering result more scientific andreasonable.The RPCL method is applied to text clustering, which can determine the numberof clustering automatically and has good effectiveness.
Keywords/Search Tags:rough set, data mining, text mining attribute reduction, clustering, categorization
PDF Full Text Request
Related items