Font Size: a A A

Research On Some Preprocessing Methods In The Field Of Data Mining

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:B W ShiFull Text:PDF
GTID:2348330563450526Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The data in the real world is incomplete,inconsistent and so on,and the data preprocessing occurs in order to improve the quality of data mining.This thesis introduces the theoretical knowledge of rough set,and on the basis of these theories,two researches are done:1.A more precise definition of the positive region is proposed on the basis of the traditional attribute dependence.By dividing the boundary region accurately,the dependence of conditional attributes on decision attributes are enhanced,and the results are obtained by using the top-down heuristic search algorithm.Through experiments on UCI data sets,the results show that REPR can be more efficient for attribute reduction on decision table than the other classical methods.2.Data discretization is first formalized and defined as optimal issue,and then definitions of IIGR(Improved Information Gain Ratio)and SIS(Statistic Information Similarity)are given as optimal goal function based on information entropy theory,and data discretization constraints are given too.At last,genetic algorithm is carried out to discretize the continuous data.The comparison of our discretization methods with others by experiments on UCI data sets shows,on statistical significance,that the number of discretized ranges by our methods is much less than others,and then the size of decision trees constructed by discretized data is much smaller and their precisions are much more accurate,indicating that our methods with optimization as direction are much effective,which synchronously discretizes all the continuous attributes and considers the relationship of attributes.
Keywords/Search Tags:Rough Set, Discretization, Attribute Reduction, Genetic Algorithm
PDF Full Text Request
Related items