Font Size: a A A

Analysis And Research Of Attribute Selection Methods In Data Mining

Posted on:2010-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:W W GuoFull Text:PDF
GTID:2178360275973157Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining techniques have been providing an effective and efficient method for data analysis, which have been widely used in retailing, military operation, business intelligence, finance and many other domains. The algorithms in data mining usually require much more qualified data, such as small redundancy, high correlation, and low noise. However, real world data often do not meet these characteristics, data pre-processings in advance have been becoming one of important tasks in data mining., Attribute selection should be a key step in data pre-processings. Any better attribute selection method could reduce data redundancy and dimensions of data effectively and efficiently, making data mining algorithms more effective on the data what have been pre-processed.This thesis first introduced the basic ideas of data mining and its processing steps, then demonstrated the importance on attribute selection for data mining, and outlined main steps and methods of attribute selection in detail. Meanwhile, it focused on one of data mining research platforms -Weka, mainly on the analysis of design and implementation for attribute selection algorithms, and a detail analysis of attribute selection operations. Then, an attribute selection method based on information gain and genetic algorithm is presented. Discussions on experimental results have shown its pros and cons. Finally, a method based on a minimum description length and genetic algorithm is proposed, which used a minimum description length as the evaluation of attribute sets, and a genetic algorithm as the search of attribute sets space, evaluating every attribute set during the search process, to decide whether this subset should be kept in the search process. This method could maintain robustness and efficiency in genetic algorithms, not only finding that attribute subsets in a resonable short time, but also the use of minimum description length as evaluation criteria could improve accuracy on classification by the selected attributes. A large number of experimental results have shown that this method can get good performance on most of the data sets, and the average error rate is better than the genetic algorithm that was implemented on Weka platform.
Keywords/Search Tags:Data Mining, Attribute Selection, MDL, Genetic Algorithm
PDF Full Text Request
Related items