Font Size: a A A

Data Mining Method Research Based On Rough Set Theory

Posted on:2009-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:P FuFull Text:PDF
GTID:2178360242492779Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Database Technology and Internet, Data Mining attracts great attention in information industry. The major reason is that large amount of existing data may be used widely, and it is urgently necessary to convert these data into data into useful information and knowledge.Traditional information processing techniques are now not adaptive to practical applications. People need more powerful and more efficient information processing techniques, which can discover interesting knowledge from massive information, to guide to making decisions. The theory of Rough Sets that was put forward by a Polish mathematician named Z.pawlak in 1982 is a new tool for processing vague and underfined knowledge. In the whole process of Data Mining, Rough Sets is applied in the aspect of preprocessing of Data Mining. From this point and with the theories of Rough Sets in Data Mining to prepare the process of step for clues, this thesis studies a few problems of the theory. The following are some main points discussed by the thesis:(1) The problem of continuous arrtibutes discretizationRough Sets can deal with the discrete attributes outstandingly; however, it can't process the continuous attributes. Thus we need to change the continuity into the discretization in the practical application in Data Mining. This thesis approaches a discretization method of continuous attributes based on GA.And this method can avoid obtaining locally optimized results when using discretization method based on support and import.The experiment proves that this algorithms looks after both the overall situation and accuracy in discretization attributes.(2) Attributes reductionAnalysis the shortage of the current reduction algorithm, this thesis approaches based on information entropy core sets genetic attributes reduction algorithm. Introducing information entropy theory to preprocess the reduction data, In the process of reduction, it can enhance the convergence speed of algorithm and advance the reduction efficiency. The experiment proves that it is the good algorithms, which can get the best reduction in information system.(3) Rules pick-upIn the current utility, the data in database is always incremental.Therefore incremental reduction of rules is a topic of general interest in the field of knowledge discovery.In this thesis, an incremental learning method based on rough set theory and decision trees techniques is proposed.Then it is compared with classical and RRIA algorithm.The results show the method and effect of the algorithm are better.
Keywords/Search Tags:Data Mining, Rough Set, Discretization of Continuous Attributes, Attribute Reduction, Rules Pick-up
PDF Full Text Request
Related items